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Abstract 

Compressive  sensing  is  a  signal  acquisition  framework  based  on  the  revelation  that  a  small  col¬ 
lection  of  linear  projections  of  a  sparse  signal  contains  enough  information  for  stable  recovery. 

In  this  paper  we  introduce  a  new  theory  for  distributed  compressive  sensing  (DCS)  that  en¬ 
ables  new  distributed  coding  algorithms  for  multi-signal  ensembles  that  exploit  both  intra-  and 
inter-signal  correlation  structures.  The  DCS  theory  rests  on  a  new  concept  that  we  term  the 
joint  sparsity  of  a  signal  ensemble.  Our  theoretical  contribution  is  to  characterize  the  funda¬ 
mental  performance  limits  of  DCS  recovery  for  jointly  sparse  signal  ensembles  in  the  noiseless 
measurement  setting;  our  result  connects  single-signal,  joint,  and  distributed  (multi-encoder) 
compressive  sensing.  To  demonstrate  the  efficacy  of  our  framework  and  to  show  that  additional 
challenges  such  as  computational  tractability  can  be  addressed,  we  study  in  detail  three  example 
models  for  jointly  sparse  signals.  For  these  models,  we  develop  practical  algorithms  for  joint 
recovery  of  multiple  signals  from  incoherent  projections.  In  two  of  our  three  models,  the  results 
are  asymptotically  best-possible,  meaning  that  both  the  upper  and  lower  bounds  match  the 
performance  of  our  practical  algorithms.  Moreover,  simulations  indicate  that  the  asymptotics 
take  effect  with  just  a  moderate  number  of  signals.  DCS  is  immediately  applicable  to  a  range 
of  problems  in  sensor  arrays  and  networks. 

Keywords:  Compressive  sensing,  distributed  source  coding,  sparsity,  random  projection,  random  matrix, 
linear  programming,  array  processing,  sensor  networks. 


1  Introduction 

A  core  tenet  of  signal  processing  and  information  theory  is  that  signals,  images,  and  other  data  often 
contain  some  type  of  structure  that  enables  intelligent  representation  and  processing.  The  notion 
of  structure  has  been  characterized  and  exploited  in  a  variety  of  ways  for  a  variety  of  purposes.  In 
this  paper,  we  focus  on  exploiting  signal  correlations  for  the  purpose  of  compression. 
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ference  on  Signals,  Systems  and  Computers  [2],  the  Conference  on  Neural  Information  Processing  Systems  [3],  and 
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Current  state-of-the-art  compression  algorithms  employ  a  decorrelating  transform  such  as  an 
exact  or  approximate  Karhunen-Loeve  transform  (KLT)  to  compact  a  correlated  signal’s  energy 
into  just  a  few  essential  coefficients  [5-7].  Such  transform  coders  exploit  the  fact  that  many  signals 
have  a  sparse  representation  in  terms  of  some  basis,  meaning  that  a  small  number  K  of  adaptively 
chosen  transform  coefficients  can  be  transmitted  or  stored  rather  than  N  3>  K  signal  samples.  For 
example,  smooth  signals  are  sparse  in  the  Fourier  basis,  and  piecewise  smooth  signals  are  sparse 
in  a  wavelet  basis  [8];  the  commercial  coding  standards  MP3  [9],  JPEG  [10],  and  JPEG2000  [11] 
directly  exploit  this  sparsity. 

1.1  Distributed  source  coding 

While  the  theory  and  practice  of  compression  have  been  well  developed  for  individual  signals, 
distributed  sensing  applications  involve  multiple  signals,  for  which  there  has  been  less  progress. 
Such  settings  are  motivated  by  the  proliferation  of  complex,  multi-signal  acquisition  architectures, 
such  as  acoustic  and  RF  sensor  arrays,  as  well  as  sensor  networks.  These  architectures  sometimes 
involve  battery-powered  devices,  which  restrict  the  communication  energy,  and  high  aggregate  data 
rates,  limiting  bandwidth  availability;  both  factors  make  the  reduction  of  communication  critical. 

Fortunately,  since  the  sensors  presumably  observe  related  phenomena,  the  ensemble  of  signals 
they  acquire  can  be  expected  to  possess  some  joint  structure,  or  inter-signal  correlation ,  in  addition 
to  the  intra-signal  correlation  within  each  individual  sensor’s  measurements.  In  such  settings, 
distributed  source  coding  that  exploits  both  intra-  and  inter-signal  correlations  might  allow  the 
network  to  save  on  the  communication  costs  involved  in  exporting  the  ensemble  of  signals  to  the 
collection  point  [12-15].  A  number  of  distributed  coding  algorithms  have  been  developed  that 
involve  collaboration  amongst  the  sensors  [16-19].  Note,  however,  that  any  collaboration  involves 
some  amount  of  inter-sensor  communication  overhead. 

In  the  Slepian-  Wolf  framework  for  lossless  distributed  coding  [12-15],  the  availability  of  corre¬ 
lated  side  information  at  the  decoder  (collection  point)  enables  each  sensor  node  to  communicate 
losslessly  at  its  conditional  entropy  rate  rather  than  at  its  individual  entropy  rate,  as  long  as  the  sum 
rate  exceeds  the  joint  entropy  rate.  Slepian- Wolf  coding  has  the  distinct  advantage  that  the  sen¬ 
sors  need  not  collaborate  while  encoding  their  measurements,  which  saves  valuable  communication 
overhead.  Unfortunately,  however,  most  existing  coding  algorithms  [14, 15]  exploit  only  inter-signal 
correlations  and  not  intra-signal  correlations.  To  date  there  has  been  only  limited  progress  on 
distributed  coding  of  so-called  “sources  with  memory.”  The  direct  implementation  for  sources  with 
memory  would  require  huge  lookup  tables  [12].  Furthermore,  approaches  combining  pre-  or  post¬ 
processing  of  the  data  to  remove  intra-signal  correlations  combined  with  Slepian- Wolf  coding  for 
the  inter-signal  correlations  appear  to  have  limited  applicability,  because  such  processing  would 
alter  the  data  in  a  way  that  is  unknown  to  other  nodes.  Finally,  although  recent  papers  [20-22] 
provide  compression  of  spatially  correlated  sources  with  memory,  the  solution  is  specific  to  lossless 
distributed  compression  and  cannot  be  readily  extended  to  lossy  compression  settings.  We  conclude 
that  the  design  of  constructive  techniques  for  distributed  coding  of  sources  with  both  intra-  and 
inter-signal  correlation  is  a  challenging  problem  with  many  potential  applications. 

1.2  Compressive  sensing  (CS) 

A  new  framework  for  single-signal  sensing  and  compression  has  developed  under  the  rubric  of 
compressive  sensing  (CS).  CS  builds  on  the  work  of  Candes,  Romberg,  and  Tao  [23]  and  Donoho  [24], 
who  showed  that  if  a  signal  has  a  sparse  representation  in  one  basis  then  it  can  be  recovered  from 
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a  small  number  of  projections  onto  a  second  basis  that  is  incoherent  with  the  first.1  CS  relies  on 
tractable  recovery  procedures  that  can  provide  exact  recovery  of  a  signal  of  length  N  and  sparsity 
K ,  i.e. ,  a  signal  that  can  be  written  as  a  sum  of  K  basis  functions  from  some  known  basis,  where 
K  can  be  orders  of  magnitude  less  than  N . 

The  implications  of  CS  are  promising  for  many  applications,  especially  for  sensing  signals  that 
have  a  sparse  representation  in  some  basis.  Instead  of  sampling  a  JC-sparse  signal  N  times,  only 
M  =  0(K  log  N)  incoherent  measurements  suffice,  where  K  can  be  orders  of  magnitude  less  than 
N.  Moreover,  the  M  measurements  need  not  be  manipulated  in  any  way  before  being  transmitted, 
except  possibly  for  some  quantization.  Finally,  independent  and  identically  distributed  (i.i.d.) 
Gaussian  or  Bernoulli/Rademacher  (random  ±1)  vectors  provide  a  useful  universal  basis  that  is 
incoherent  with  all  others.  Hence,  when  using  a  random  basis,  CS  is  universal  in  the  sense  that  the 
sensor  can  apply  the  same  measurement  mechanism  no  matter  what  basis  sparsifies  the  signal  [27]. 

While  powerful,  the  CS  theory  at  present  is  designed  mainly  to  exploit  intra-signal  structures 
at  a  single  sensor.  In  a  multi-sensor  setting,  one  can  naively  obtain  separate  measurements  from 
each  signal  and  recover  them  separately.  However,  it  is  possible  to  obtain  measurements  that  each 
depend  on  all  signals  in  the  ensemble  by  having  sensors  collaborate  with  each  other  in  order  to 
combine  all  of  their  measurements;  we  term  this  process  a  joint  measurement  setting.  In  fact,  initial 
work  in  CS  for  multi-sensor  settings  used  standard  CS  with  joint  measurement  and  recovery  schemes 
that  exploit  inter-signal  correlations  [28-32].  However,  by  recovering  sequential  time  instances  of 
the  sensed  data  individually,  these  schemes  ignore  intra-signal  correlations. 

1.3  Distributed  compressive  sensing  (DCS) 

In  this  paper  we  introduce  a  new  theory  for  distributed  compressive  sensing  (DCS)  to  enable  new 
distributed  coding  algorithms  that  exploit  both  intra-  and  inter-signal  correlation  structures.  In 
a  typical  DCS  scenario,  a  number  of  sensors  measure  signals  that  are  each  individually  sparse  in 
some  basis  and  also  correlated  from  sensor  to  sensor.  Each  sensor  separately  encodes  its  signal 
by  projecting  it  onto  another,  incoherent  basis  (such  as  a  random  one)  and  then  transmits  just  a 
few  of  the  resulting  coefficients  to  a  single  collection  point.  Unlike  the  joint  measurement  setting 
described  in  Section  1.2,  DCS  requires  no  collaboration  between  the  sensors  during  signal  acquisi¬ 
tion.  Nevertheless,  we  are  able  to  exploit  the  inter-signal  correlation  by  using  all  of  the  obtained 
measurements  to  recover  all  the  signals  simultaneously.  Under  the  right  conditions,  a  decoder  at 
the  collection  point  can  recover  each  of  the  signals  precisely. 

The  DCS  theory  rests  on  a  concept  that  we  term  the  joint  sparsity  —  the  sparsity  of  the 
entire  signal  ensemble.  The  joint  sparsity  is  often  smaller  than  the  aggregate  over  individual  signal 
sparsities.  Therefore,  DCS  offers  a  reduction  in  the  number  of  measurements,  in  a  manner  analogous 
to  the  rate  reduction  offered  by  the  Slepian-Wolf  framework  [13].  Unlike  the  single-signal  definition 
of  sparsity,  however,  there  are  numerous  plausible  ways  in  which  joint  sparsity  could  be  defined.  In 
this  paper,  we  first  provide  a  general  framework  for  joint  sparsity  using  algebraic  formulations  based 
on  a  graphical  model.  Using  this  framework,  we  derive  bounds  for  the  number  of  measurements 
necessary  for  recovery  under  a  given  signal  ensemble  model.  Similar  to  Slepian-Wolf  coding  [13],  the 
number  of  measurements  required  for  each  sensor  must  account  for  the  minimal  features  unique  to 
that  sensor,  while  at  the  same  time  features  that  appear  among  multiple  sensors  must  be  amortized 
over  the  group.  Our  bounds  are  dependent  on  the  dimensionality  of  the  subspaces  in  which  each 
group  of  signals  reside;  they  afford  a  reduction  in  the  number  of  measurements  that  we  quantify 
through  the  notions  of  joint  and  conditional  sparsity,  which  are  conceptually  related  to  joint  and 

1  Roughly  speaking,  incoherence  means  that  no  element  of  one  basis  has  a  sparse  representation  in  terms  of  the 
other  basis.  This  notion  has  a  variety  of  formalizations  in  the  CS  literature  [24-26] . 
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conditional  entropies.  The  common  thread  is  that  dimensionality  and  entropy  both  quantify  the 
volume  that  the  measurement  and  coding  rates  must  cover.  Our  results  are  also  applicable  to  cases 
where  the  signal  ensembles  are  measured  jointly,  as  well  as  to  the  single-signal  case. 

While  our  general  framework  does  not  by  design  provide  insights  for  computationally  efficient 
recovery,  we  also  provide  interesting  models  for  joint  sparsity  where  our  results  carry  through  from 
the  general  framework  to  realistic  settings  with  low-complexity  algorithms.  In  the  first  model, 
each  signal  is  itself  sparse,  and  so  we  could  use  CS  to  separately  encode  and  decode  each  signal. 
However,  there  also  exists  a  framework  wherein  a  joint  sparsity  model  for  the  ensemble  uses  fewer 
total  coefficients.  In  the  second  model,  all  signals  share  the  locations  of  the  nonzero  coefficients. 
In  the  third  model,  no  signal  is  itself  sparse,  yet  there  still  exists  a  joint  sparsity  among  the  signals 
that  allows  recovery  from  significantly  fewer  than  N  measurements  per  sensor.  For  each  model 
we  propose  tractable  algorithms  for  joint  signal  recovery,  followed  by  theoretical  and  empirical 
characterizations  of  the  number  of  measurements  per  sensor  required  for  accurate  recovery.  We 
show  that,  under  these  models,  joint  signal  recovery  can  recover  signal  ensembles  from  significantly 
fewer  measurements  than  would  be  required  to  recover  each  signal  individually.  In  fact,  for  two  of 
our  three  models  we  obtain  best-possible  performance  that  could  not  be  bettered  by  an  oracle  that 
knew  the  the  indices  of  the  nonzero  entries  of  the  signals. 

This  paper  focuses  primarily  on  the  basic  task  of  reducing  the  number  of  measurements  for 
recovery  of  a  signal  ensemble  in  order  to  reduce  the  communication  cost  of  source  coding  that 
ensemble.  Our  emphasis  is  on  noiseless  measurements  of  strictly  sparse  signals,  where  the  optimal 
recovery  relies  on  ^o-norm  optimization,2  which  is  computationally  intractable.  In  practical  settings, 
additional  criteria  may  be  relevant  for  measuring  performance.  For  example,  the  measurements  will 
typically  be  real  numbers  that  must  be  quantized,  which  gradually  degrades  the  recovery  quality  as 
the  quantization  becomes  coarser  [33,34],  Characterizing  DCS  in  light  of  practical  considerations 
such  as  rate-distortion  tradeoffs,  power  consumption  in  sensor  networks,  etc.,  are  topics  of  future 
research  [31,32], 

1.4  Paper  organization 

Section  2  overviews  the  single-signal  CS  theories  and  provides  a  new  result  on  CS  recovery.  While 
some  readers  may  be  familiar  with  this  material,  we  include  it  to  make  the  paper  self-contained. 
Section  3  introduces  our  general  framework  for  joint  sparsity  models  and  proposes  three  example 
models  for  joint  sparsity.  We  provide  our  detailed  analysis  for  the  general  framework  in  Section  4; 
we  then  address  the  three  models  in  Section  5.  We  close  the  paper  with  a  discussion  and  conclusions 
in  Section  6.  Several  appendices  contain  the  proofs. 

2  Compressive  Sensing  Background 

2.1  Transform  coding 

Consider  a  real-valued  signal3  x  £  M.N  indexed  as  x(n),  n  £  { 1,2,...,  N}.  Suppose  that  the  basis 
T  =  [-01 , . . . ,  0jv]  provides  a  K -sparse  representation  of  x;  that  is, 

N  I< 

X  =  V’nfc,  (1) 

n=l  fc=l 

2The  £o  “norm”  ||x||o  merely  counts  the  number  of  nonzero  entries  in  the  vector  x. 

3Without  loss  of  generality,  we  will  focus  on  one-dimensional  signals  (vectors)  for  notational  simplicity;  the  exten¬ 
sion  to  multi-dimensional  signal,  e.g.,  images,  is  straightforward. 
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where  x  is  a  linear  combination  of  K  vectors  chosen  from  T,  {n^}  are  the  indices  of  those  vectors, 
and  {$(n)}  are  the  coefficients;  the  concept  is  extendable  to  tight  frames  [8].  Alternatively,  we  can 
write  in  matrix  notation  x  =  TT?,  where  x  is  an  TV  x  1  column  vector,  the  sparse  basis  matrix  T 
is  TV  x  TV  with  the  basis  vectors  ifn  as  columns,  and  i?  is  an  TV  x  1  column  vector  with  K  nonzero 
elements.  Using  ||  •  ||p  to  denote  the  lp  norm,  we  can  write  that  ||$||o  =  K;  we  can  also  write  the 
set  of  nonzero  indices  Q  C  {1, . . . ,  TV},  with  |fi|  =  K.  Various  expansions,  including  wavelets  [8], 
Gabor  bases  [8],  curvelets  [35],  etc.,  are  widely  used  for  representation  and  compression  of  natural 
signals,  images,  and  other  data. 

The  standard  procedure  for  compressing  sparse  and  nearly-sparse  signals,  known  as  transform 
coding,  is  to  (?)  acquire  the  full  TV-sample  signal  x:  (ii)  compute  the  complete  set  of  transform 
coefficients  {$(n)};  (in)  locate  the  K  largest,  significant  coefficients  and  discard  the  (many)  small 
coefficients;  (iv)  encode  the  values  and  locations  of  the  largest  coefficients.  This  procedure  has  three 
inherent  inefficiencies:  First,  for  a  high-dimensional  signal,  we  must  start  with  a  large  number  of 
samples  TV.  Second,  the  encoder  must  compute  all  of  the  TV  transform  coefficients  {$(n)},  even 
though  it  will  discard  all  but  K  of  them.  Third,  the  encoder  must  encode  the  locations  of  the  large 
coefficients,  which  requires  increasing  the  coding  rate  since  the  locations  change  with  each  signal. 

We  will  focus  our  theoretical  development  on  exactly  iv-sparse  signals  and  defer  discussion  of 
the  more  general  situation  of  compressible  signals  where  the  coefficients  decay  rapidly  with  a  power 
law  but  not  to  zero.  Section  6  contains  additional  discussion  on  real-world  compressible  signals, 
and  [36]  presents  simulation  results. 

2.2  Incoherent  projections 

These  inefficiencies  raise  a  simple  question:  For  a  given  signal,  is  it  possible  to  directly  estimate 
the  set  of  large  d(n)  ’s  that  will  not  be  discarded?  While  this  seems  improbable,  Candes,  Romberg, 
and  Tao  [23,  25]  and  Donoho  [24]  have  shown  that  a  reduced  set  of  projections  can  contain  enough 
information  to  recover  sparse  signals.  A  framework  to  acquire  sparse  signals,  often  referred  to  as 
compressive  sensing  (CS)  [37],  has  emerged  that  builds  on  this  principle. 

In  CS,  we  do  not  measure  or  encode  the  K  significant  d(n)  directly.  Rather,  we  measure 
and  encode  M  <  TV  projections  y(m)  =  ( x ,  4>m)  °f  the  signal  onto  a  second  set  of  functions 
{4>m},  m  =  1,  2, . . . ,  M,  where  4>Jn  denotes  the  transpose  of  cj>m  and  (■,  ■)  denotes  the  inner  product. 
In  matrix  notation,  we  measure  y  =  4>x,  where  y  is  an  M  x  1  column  vector  and  the  measurement 
matrix  4>  is  M  x  TV  with  each  row  a  measurement  vector  (f>m.  Since  M  <  TV,  recovery  of  the  signal 
x  from  the  measurements  y  is  ill-posed  in  general;  however  the  additional  assumption  of  signal 
sparsity  makes  recovery  possible  and  practical. 

The  CS  theory  tells  us  that  when  certain  conditions  hold,  namely  that  the  basis  {ifn}  cannot 
sparsely  represent  the  vectors  {4>m}  (a  condition  known  as  incoherence  [24-26])  and  the  number 
of  measurements  M  is  large  enough  (proportional  to  K),  then  it  is  indeed  possible  to  recover  the 
set  of  large  {$(n)|  (and  thus  the  signal  x)  from  the  set  of  measurements  {y(m) }  [24,25].  This 
incoherence  property  holds  for  many  pairs  of  bases,  including  for  example,  delta  spikes  and  the  sine 
waves  of  a  Fourier  basis,  or  the  Fourier  basis  and  wavelets.  Signals  that  are  sparsely  represented 
in  frames  or  unions  of  bases  can  be  recovered  from  incoherent  measurements  in  the  same  fashion. 
Significantly,  this  incoherence  also  holds  with  high  probability  between  any  arbitrary  fixed  basis 
or  frame  and  a  randomly  generated  one.  In  the  sequel,  we  will  focus  our  analysis  to  such  random 
measurement  procedures. 
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2.3  Signal  recovery  via  £o-norm  minimization 

The  recovery  of  the  sparse  set  of  significant  coefficients  {■d(n)}  can  be  achieved  using  optimization 
by  searching  for  the  signal  with  the  sparsest  coefficient  vector  {$(n)}  that  agrees  with  the  M 
observed  measurements  in  y  (recall  that  M  <  N).  Recovery  relies  on  the  key  observation  that, 
under  mild  conditions  on  and  T,  the  coefficient  vector  d  is  the  unique  solution  to  the  f'o-norm 
minimization 

d  =  argmin  ||i?||o  s.t.  y  =  (2) 

with  overwhelming  probability.  (Thanks  to  the  incoherence  between  the  two  bases,  if  the  original 
signal  is  sparse  in  the  d  coefficients,  then  no  other  set  of  sparse  signal  coefficients  d'  can  yield  the 
same  projections  y.) 

In  principle,  remarkably  few  incoherent  measurements  are  required  to  recover  a  K -sparse  signal 
via  Iq -norm  minimization.  More  than  K  measurements  must  be  taken  to  avoid  ambiguity;  the 
following  theorem,  proven  in  Appendix  A,  establishes  that  K  + 1  random  measurements  will  suffice. 
Similar  results  were  established  by  Venkataramani  and  Bresler  [38]. 

Theorem  1  Let  T  be  an  orthonormal  basis  for  and  let  1  <  K  <  N.  Then: 

1.  Let  <f>  be  an  M  x  N  measurement  matrix  with  i.i.d.  Gaussian  entries  with  M  >  2 K .  Then  all 
signals  x  =  ltd  having  expansion  coefficients  d  G  lw  that  satisfy  ||i?||o  =  K  can  be  recovered 
uniquely  from  the  M -dimensional  measurement  vector  y  =  via  the  £o-norm  minimization 
(2)  with  probability  one  over  <f>. 

2.  Let  x  =  Tt?  such  that  ||i?||o  =  K.  Let  <J>  be  an  M  x  N  measurement  matrix  with  i.i.d. 
Gaussian  entries  (notably,  independent  of  x)  with  M  >  K  +  1 .  Then  x  can  be  recovered 
uniquely  from  the  M -dimensional  measurement  vector  y  =  <f?£  via  the  £o-norm  minimization 
(2)  with  probability  one  over  <f>. 

3.  Let  &  be  an  M  x  N  measurement  matrix,  where  M  <  K .  Then,  aside  from  pathological  cases 
(specified  in  the  proof),  no  signal  x  =  T-d  with  ||i?||o  =  K  can  be  uniquely  recovered  from  the 
M -dimensional  measurement  vector  y  =  <J>£\ 

Remark  1  The  second  statement  of  the  theorem  differs  from  the  first  in  the  following  respect:  when 
K  <  M  <  2K ,  there  will  necessarily  exist  K -sparse  signals  x  that  cannot  be  uniquely  recovered  from 
the  M -dimensional  measurement  vector  y  =  However,  these  signals  form  a  set  of  measure  zero 
within  the  set  of  all  K -sparse  signals  and  can  safely  be  avoided  with  high  probability  if  T  is  randomly 
generated  independently  of  x. 

Comparing  the  second  and  third  statements  of  Theorem  1,  we  see  that  one  measurement 
separates  the  achievable  region ,  where  perfect  recovery  is  possible  with  probability  one,  from  the 
converse  region,  where  with  overwhelming  probability  recovery  is  impossible.  Moreover,  Theorem  1 
provides  a  strong  converse  measurement  region  in  a  manner  analogous  to  the  strong  channel  coding 
converse  theorems  of  information  theory  [12], 

Unfortunately,  solving  the  Aj-norm  minimization  problem  is  prohibitively  complex,  requiring  a 
combinatorial  enumeration  of  the  (^)  possible  sparse  subspaces.  In  fact,  the  Aj-norrn  minimization 
problem  in  general  is  known  to  be  NP-hard  [39].  Yet  another  challenge  is  robustness;  in  the  setting 
of  Theorem  1,  the  recovery  may  be  very  poorly  conditioned.  In  fact,  both  of  these  considerations 
(computational  complexity  and  robustness)  can  be  addressed,  but  at  the  expense  of  slightly  more 
measurements. 
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2.4  Signal  recovery  via  4-norm  minimization 

The  practical  revelation  that  supports  the  new  CS  theory  is  that  it  is  not  necessary  to  solve  the 
t'o-norm  minimization  to  recover  the  set  of  significant  {i?(n)}.  In  fact,  a  much  easier  problem  yields 
an  equivalent  solution  (thanks  again  to  the  incoherence  of  the  bases);  we  need  only  solve  for  the 
smallest  fi-norm  coefficient  vector  $  that  agrees  with  the  measurements  y  [24,  25]: 

l)  =  argmin  ||$||i  s.t.  y  =  TTd.  (3) 

This  optimization  problem,  also  known  as  Basis  Pursuit,  is  significantly  more  approachable  and 
can  be  solved  with  traditional  linear  programming  techniques  whose  computational  complexities 
are  polynomial  in  N. 

There  is  no  free  lunch,  however;  according  to  the  theory,  more  than  K  +  1  measurements 
are  required  in  order  to  recover  sparse  signals  via  Basis  Pursuit.  Instead,  one  typically  requires 
M  >  cK  measurements,  where  c  >  1  is  an  overmeasuring  factor.  As  an  example,  we  quote  a 
result  asymptotic  in  N.  For  simplicity,  we  assume  that  the  sparsity  scales  linearly  with  N ;  that  is, 
K  =  SN,  where  we  call  S  the  sparsity  rate. 

Theorem  2  [39-41]  Set  K  =  SN  with  0  <  S  «  1.  Then  there  exists  an  overmeasuring  fac¬ 
tor  c(S)  =  0(log(l/5)),  c(S)  >  1,  such  that,  for  a  K -sparse  signal  x  in  basis  T,  the  following 
statements  hold: 

1.  The  probability  of  recovering  x  via  i\-norm  minimization  from  ( c(S)+e)K  random  projections, 
e  >  0,  converges  to  one  as  N  — *  oo. 

2.  The  probability  of  recovering  x  via  l\-norm  minimization  from  ( c(S)—e)K  random  projections, 
e  >  0,  converges  to  zero  as  N  — >  oo. 

In  an  illuminating  series  of  papers,  Donoho  and  Tanner  [40-42]  have  characterized  the  over¬ 
measuring  factor  c(S)  precisely.  In  our  work,  we  have  noticed  that  the  overmeasuring  factor  is  quite 
similar  to  log2(l  +  We  find  this  expression  a  useful  rule  of  thumb  to  approximate  the  precise 

overmeasuring  ratio.  Additional  overmeasuring  is  proven  to  provide  robustness  to  measurement 
noise  and  quantization  error  [25]. 

Throughout  this  paper  we  use  the  abbreviated  notation  c  to  describe  the  overmeasuring  factor 
required  in  various  settings  even  though  c(S)  depends  on  the  sparsity  K  and  signal  length  N. 

2.5  Signal  recovery  via  greedy  pursuit 

Iterative  greedy  algorithms  have  also  been  developed  to  recover  the  signal  x  from  the  measurements 
y.  The  Orthogonal  Matching  Pursuit  (OMP)  algorithm,  for  example,  iteratively  selects  the  vectors 
from  the  matrix  4>T  that  contain  most  of  the  energy  of  the  measurement  vector  y.  The  selection 
at  each  iteration  is  made  based  on  inner  products  between  the  columns  of  4)>I'  and  a  residual;  the 
residual  reflects  the  component  of  y  that  is  orthogonal  to  the  previously  selected  columns.  The 
algorithm  has  been  proven  to  successfully  recover  the  acquired  signal  from  incoherent  measurements 
with  high  probability,  at  the  expense  of  slightly  more  measurements,  [26, 43].  Algorithms  inspired  by 
OMP,  such  as  regularized  orthogonal  matching  pursuit  [44],  CoSaMP  [45],  and  Subspace  Pursuit  [46] 
have  been  shown  to  attain  similar  guarantees  to  those  of  their  optimization-based  counterparts.  In 
the  following,  we  will  exploit  both  Basis  Pursuit  and  greedy  algorithms  for  recovering  jointly  sparse 
signals  from  incoherent  measurements. 
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2.6  Properties  of  random  measurements 

In  addition  to  offering  substantially  reduced  measurement  rates,  CS  has  many  attractive  and  in¬ 
triguing  properties,  particularly  when  we  employ  random  projections  at  the  sensors.  Random 
measurements  are  universal  in  the  sense  that  any  sparse  basis  can  be  used,  allowing  the  same  en¬ 
coding  strategy  to  be  applied  in  different  sensing  environments.  Random  measurements  are  also 
future-proof:  if  a  better  sparsity-inducing  basis  is  found  for  the  signals,  then  the  same  measurements 
can  be  used  to  recover  a  more  accurate  view  of  the  environment.  Random  coding  is  also  robust:  the 
measurements  coming  from  each  sensor  have  equal  priority,  unlike  Fourier  or  wavelet  coefficients  in 
current  coders.  Finally,  random  measurements  allow  a  progressively  better  recovery  of  the  data  as 
more  measurements  are  obtained;  one  or  more  measurements  can  also  be  lost  without  corrupting 
the  entire  recovery. 

2.7  Related  work 

Several  researchers  have  formulated  joint  measurement  settings  for  CS  in  sensor  networks  that 
exploit  inter-signal  correlations  [28-32],  In  their  approaches,  each  sensor  n  €  {1,2, . . . , N}  simul¬ 
taneously  records  a  single  reading  x(n)  of  some  spatial  field  (temperature  at  a  certain  time,  for 
example).4  Each  of  the  sensors  generates  a  pseudorandom  sequence  rn(m),m  =  1,2, .. . ,  M,  and 
modulates  the  reading  as  x(n)rn(m).  Each  sensor  n  then  transmits  its  M  numbers  in  sequence  to 
the  collection  point  where  the  measurements  are  aggregated,  obtaining  M  measurements  y(m)  = 
J2n=ix(7l)rn(m)-  Thus,  defining  x  =  [x(l),  x(2), . . .  ,x(N)]T  and  4>m  =  [rjm),  r2(m), . . . ,  rjv(m)], 
the  collection  point  automatically  receives  the  measurement  vector  y  =  [y(l),  y{2), . . .  ,y(M)]T  = 
after  M  transmission  steps.  The  samples  x(n)  of  the  spatial  field  can  then  be  recovered  using 
CS  provided  that  x  has  a  sparse  representation  in  a  known  basis.  These  methods  have  a  major 
limitation:  since  they  operate  at  a  single  time  instant,  they  exploit  only  inter-signal  and  not  intra¬ 
signal  correlations;  that  is,  they  essentially  assume  that  the  sensor  held  is  i.i.d.  from  time  instant 
to  time  instant.  In  contrast,  we  will  develop  signal  models  and  algorithms  that  are  agnostic  to  the 
spatial  sampling  structure  and  that  exploit  both  inter-  and  intra-signal  correlations. 

Recent  work  has  adapted  DCS  to  the  finite  rate  of  innovation  signal  acquisition  framework  [47] 
and  to  the  continuous-time  setting  [48].  Since  the  original  submission  of  this  paper,  additional  work 
has  focused  on  the  analysis  and  proposal  of  recovery  algorithms  for  jointly  sparse  signals  [49,50]. 

3  Joint  Sparsity  Signal  Models 

In  this  section,  we  generalize  the  notion  of  a  signal  being  sparse  in  some  basis  to  the  notion  of  an 
ensemble  of  signals  being  jointly  sparse. 

3.1  Notation 

We  will  use  the  following  notation  for  signal  ensembles  and  our  measurement  model.  Let  A  := 
{1,  2  ,...,</}  denote  the  set  of  indices  for  the  J  signals  in  the  ensemble.  Denote  the  signals  in  the 
ensemble  by  Xj.  with  j  G  A  and  assume  that  each  signal  Xj  €  M.N .  We  use  Xj(n)  to  denote  sample 
n  in  signal  j,  and  assume  for  the  sake  of  illustration  —  but  without  loss  of  generality  —  that  these 
signals  are  sparse  in  the  canonical  basis,  i.e. ,  T  =  I.  The  entries  of  the  signal  can  take  arbitrary 
real  values. 

We  denote  by  <Fj  the  measurement  matrix  for  signal  j;  4>j  is  Mj  x  N  and,  in  general,  the 
entries  of  are  different  for  each  j .  Thus,  yj  =  QjXj  consists  of  Mj  <  N  random  measurements 

4Note  that  in  Section  2.7  only,  N  refers  to  the  number  of  sensors,  since  each  sensor  acquires  a  signal  sample. 


of  Xj.  We  will  emphasize  random  i.i.d.  Gaussian  matrices  in  the  following,  but  other  schemes 
are  possible,  including  random  ±1  Bernoulli/Rademacher  matrices,  and  so  on. 

To  compactly  represent  the  signal  and  measurement  ensembles,  we  denote  M  =  Y^jeA  Mj  and 

define  X  G  RJN ,  Y  G  Mm,  and  $  G  MMxJ7V  as 


(4) 


with  0  denoting  a  matrix  of  appropriate  size  with  all  entries  equal  to  0.  We  then  have  Y  = 
&X.  Equation  (4)  shows  that  separate  measurement  matrices  have  a  characteristic  block-diagonal 
structure  when  the  entries  of  the  sparse  vector  are  grouped  by  signal. 

Below  we  propose  a  general  framework  for  joint  sparsity  models  (JSMs)  and  three  example 
JSMs  that  apply  in  different  situations. 


3.2  General  framework  for  joint  sparsity 

We  now  propose  a  general  framework  to  quantify  the  sparsity  of  an  ensemble  of  correlated  signals 
X\,X2 which  allows  us  to  compare  the  complexities  of  different  signal  ensembles  and  to 
quantify  their  measurement  requirements.  The  framework  is  based  on  a  factored  representation  of 
the  signal  ensemble  that  decouples  its  location  and  value  information. 

To  motivate  this  factored  representation,  we  begin  by  examining  the  structure  of  a  single  sparse 
signal,  where  x  G  with  K  <C  N  nonzero  entries.  As  an  alternative  to  the  notation  used  in  (1), 
we  can  decouple  the  location  and  value  information  in  x  by  writing  x  =  P6,  where  6  G  contains 
only  the  nonzero  entries  of  x,  and  P  is  an  identity  submatrix,  i.e. ,  P  contains  K  columns  of  the 
N  x  N  identity  matrix  I.  Any  K- sparse  signal  can  be  written  in  similar  fashion.  To  model  the 
set  of  all  possible  sparse  signals,  we  can  then  let  V  be  the  set  of  all  identity  submatrices  of  all 
possible  sizes  N  x  K' ,  with  1  <  K'  <  N .  We  refer  to  V  as  a  sparsity  model.  Whether  a  signal 
is  sufficiently  sparse  is  defined  in  the  context  of  this  model :  given  a  signal  x,  one  can  consider  all 
possible  factorizations  x  =  Pd  with  ?G?.  Among  these  factorizations,  the  unique  representation 
with  smallest  dimensionality  for  6  equals  the  sparsity  level  of  the  signal  x  under  the  model  V. 

In  the  signal  ensemble  case,  we  consider  factorizations  of  the  form  X  =  PQ  where  X  G 
as  above,  P  G  M.JNxS,  and  0  G  Is  for  various  integers  6.  We  refer  to  P  and  0  as  the  location 
matrix  and  value  vector,  respectively.  A  joint  sparsity  model  (JSM)  is  defined  in  terms  of  a  set  V 
of  admissible  location  matrices  P  with  varying  numbers  of  columns;  we  specify  below  additional 
conditions  that  the  matrices  P  must  satisfy  for  each  model.  For  a  given  ensemble  X,  we  let 
Vp-(X)  C  V  denote  the  set  of  feasible  location  matrices  P  G  V  for  which  a  factorization  X  =  PQ 
exists.  We  define  the  joint  sparsity  level  of  the  signal  ensemble  as  follows. 

Definition  1  The  joint  sparsity  level  D  of  the  signal  ensemble  X  is  the  number  of  columns  of  the 
smallest  matrix  P  G  Vf(X). 

In  contrast  to  the  single-signal  case,  there  are  several  natural  choices  for  what  matrices  P 
should  be  members  of  a  joint  sparsity  model  V.  We  restrict  our  attention  in  the  sequel  to  what 
we  call  common/innovation  component  JSMs.  In  these  models  each  signal  Xj  is  generated  as  a 
combination  of  two  components:  ( i )  a  common  component  zc,  which  is  present  in  all  signals,  and 
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(ii)  an  innovation  component  Zj,  which  is  unique  to  each  signal.  These  combine  additively,  giving 

Xj=zc  +  zj,  j  E  A. 

Note,  however,  that  the  individual  components  might  be  zero-valued  in  specific  scenarios.  We  can 
express  the  component  signals  as 

zc  =  Pc'Oc,  Zj  =  PjOj,  J  e  A, 

where  9  c  E  and  each  9j  E  WLKj  have  nonzero  entries.  Each  matrix  P  E  V  that  can  express 

such  signals  {xj}  has  the  form 

Pc  Pi  0  •  •  •  0 

Pc  0  P>  ■  ■  ■  0 

P  (5) 

_PC  0  0  ...  Pj  _ 

where  Pc,  {Pj}j& a  are  identity  submatrices.  We  define  the  value  vector  as  0  =  [Oj,  9\  6 J  ...  0Tj]T , 
where  9c  €  WKc  and  each  9j  E  WK* ,  to  obtain  X  =  P0.  Although  the  values  of  Kq  and  Kj 
are  dependent  on  the  matrix  P,  we  omit  this  dependency  in  the  sequel  for  brevity,  except  when 
necessary  for  clarity. 

If  a  signal  ensemble  X  =  P0,  0  €  M*5  were  to  be  generated  by  a  selection  of  Pc  and  {Pj}je a, 
where  all  J  + 1  identity  submatrices  share  a  common  column  vector,  then  P  would  not  be  full  rank. 
In  other  cases,  we  may  observe  a  vector  0  that  has  zero- valued  entries;  i.e.,  we  may  have  9j(k)  =  0 
for  some  1  <  k  <  Kj  and  some  j  E  A,  or  9c(k)  =  0  for  some  1  <  k  <  I\c-  In  both  of  these  cases, 
by  removing  one  instance  of  this  column  from  any  of  the  identity  submatrices,  one  can  obtain  a 
matrix  Q  with  fewer  columns  for  which  there  exists  Q'  E  M*5-1  that  gives  X  =  QQ' .  If  Q  E  P, 
then  we  term  this  phenomenon  sparsity  reduction.  Sparsity  reduction,  when  present,  reduces  the 
effective  joint  sparsity  of  a  signal  ensemble.  As  an  example  of  sparsity  reduction,  consider  J  =  2 
signals  of  length  N  =  2.  Consider  the  coefficient  zc(  1)  ^  0  of  the  common  component  zc  and  the 
corresponding  innovation  coefficients  zi(l), ^(l)  /  0.  Suppose  that  all  other  coefficients  are  zero. 
The  location  matrix  P  that  arises  is 

'110' 

OOO 
10  1' 

.00°. 

The  span  of  this  location  matrix  (i.e.,  the  set  of  signal  ensembles  X  that  it  can  generate)  remains 
unchanged  if  we  remove  any  one  of  the  columns,  i.e.,  if  we  drop  any  entry  of  the  value  vector  0. 
This  provides  us  with  a  lower-dimensional  representation  0;  of  the  same  signal  ensemble  X  under 
the  JSM  P;  the  joint  sparsity  of  X  is  D  =  2. 

3.3  Example  joint  sparsity  models 

Since  different  real-world  scenarios  lead  to  different  forms  of  correlation  within  an  ensemble  of  sparse 
signals,  we  consider  several  possible  designs  for  a  JSM  V.  The  distinctions  among  our  three  JSMs 
concern  the  differing  sparsity  assumptions  regarding  the  common  and  innovation  components. 
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3.3.1  JSM-1:  Sparse  common  component  +  innovations 

In  this  model,  we  suppose  that  each  signal  contains  a  common  component  zc  that  is  sparse  plus 
an  innovation  component  Zj  that  is  also  sparse.  Thus,  this  joint  sparsity  model  (JSM-1)  V  is 
represented  by  the  set  of  all  matrices  of  the  form  (5)  with  I\q  and  all  Kj  smaller  than  N.  Assuming 
that  sparsity  reduction  is  not  possible,  the  joint  sparsity  D  =  Kq  +  Ylje A  • 

A  practical  situation  well-modeled  by  this  framework  is  a  group  of  sensors  measuring  temper¬ 
atures  at  a  number  of  outdoor  locations  throughout  the  day.  The  temperature  readings  Xj  have 
both  temporal  (intra-signal)  and  spatial  (inter-signal)  correlations.  Global  factors,  such  as  the  sun 
and  prevailing  winds,  could  have  an  effect  zc  that  is  both  common  to  all  sensors  and  structured 
enough  to  permit  sparse  representation.  More  local  factors,  such  as  shade,  water,  or  animals,  could 
contribute  localized  innovations  Zj  that  are  also  structured  (and  hence  sparse).  A  similar  scenario 
could  be  imagined  for  a  network  of  sensors  recording  light  intensities,  air  pressure,  or  other  phenom¬ 
ena.  All  of  these  scenarios  correspond  to  measuring  properties  of  physical  processes  that  change 
smoothly  in  time  and  in  space  and  thus  are  highly  correlated  [51,  52]. 

3.3.2  JSM-2:  Common  sparse  supports 

In  this  model,  the  common  component  zc  is  equal  to  zero,  each  innovation  component  Zj  is  sparse, 
and  the  innovations  {zj}  share  the  same  sparse  support  but  have  different  nonzero  coefficients.  To 
formalize  this  setting  in  a  joint  sparsity  model  (JSM-2)  we  let  V  represent  the  set  of  all  matrices 
of  the  form  (5),  where  Pc  =  0  and  Pj  =  P  for  all  j  £  A.  Here  P  denotes  an  arbitrary  identity 
submatrix  of  size  N  x  K,  with  K  <C  N.  For  a  given  X  =  PQ,  we  may  again  partition  the  value 
vector  0  =  [9j  8j...  0Tj  ]r,  where  each  6j  £  Rft' .  It  is  easy  to  see  that  the  matrices  P  from  JSM-2 
are  full  rank.  Therefore,  when  sparsity  reduction  is  not  possible,  the  joint  sparsity  D  =  JK. 

The  JSM-2  model  is  immediately  applicable  to  acoustic  and  RF  sensor  arrays,  where  each 
sensor  acquires  a  replica  of  the  same  Fourier-sparse  signal  but  with  phase  shifts  and  attenuations 
caused  by  signal  propagation.  In  this  case,  it  is  critical  to  recover  each  one  of  the  sensed  signals. 
Another  useful  application  for  this  framework  is  MIMO  communication  [53]. 

Similar  signal  models  have  been  considered  in  the  area  of  simultaneous  sparse  approxima¬ 
tion  [53-55].  In  this  setting,  a  collection  of  sparse  signals  share  the  same  expansion  vectors  from 
a  redundant  dictionary.  The  sparse  approximation  can  be  recovered  via  greedy  algorithms  such  as 
Simultaneous  Orthogonal  Matching  Pursuit  (SOMP)  [53,  54]  or  MMV  Order  Recursive  Matching 
Pursuit  (M-ORMP)  [55].  We  use  the  SOMP  algorithm  in  our  setting  (Section  5.2)  to  recover  from 
incoherent  measurements  an  ensemble  of  signals  sharing  a  common  sparse  structure. 

3.3.3  JSM-3:  Nonsparse  common  component  sparse  innovations 

In  this  model,  we  suppose  that  each  signal  contains  an  arbitrary  common  component  zc  and  a 
sparse  innovation  component  Zj\  this  model  extends  JSM-1  by  relaxing  the  assumption  that  the 
common  component  zc  has  a  sparse  representation.  To  formalize  this  setting  in  the  JSM-3  model, 
we  let  V  represent  the  set  of  all  matrices  (5)  in  which  Pc  =  I ,  the  N  x  N  identity  matrix.  This 
implies  each  Kj  is  smaller  than  N  while  Kc  =  N ;  thus,  we  obtain  9c  £  and  9j  £  R Ki . 
Assuming  that  sparsity  reduction  is  not  possible,  the  joint  sparsity  D  =  N  +  We  also 

consider  the  specific  case  where  the  supports  of  the  innovations  are  shared  by  all  signals,  which 
extends  JSM-2;  in  this  case  we  will  have  Pj  =  P  for  all  j  £  A,  with  P  an  identity  submatrix  of  size 
N  x  K.  It  is  easy  to  see  that  in  this  case  sparsity  reduction  is  possible,  and  so  the  the  joint  sparsity 
can  drop  to  D  =  N  +  ( J  —  1  )K.  Note  that  separate  CS  recovery  is  impossible  in  JSM-3  with  any 
fewer  than  N  measurements  per  sensor,  since  the  common  component  is  not  sparse.  However,  we 
will  demonstrate  that  joint  CS  recovery  can  indeed  exploit  the  common  structure. 
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A  practical  situation  well-modeled  by  this  framework  is  where  several  sources  are  recorded  by 
different  sensors  together  with  a  background  signal  that  is  not  sparse  in  any  basis.  Consider,  for 
example,  a  verification  system  in  a  component  production  plant,  where  cameras  acquire  snapshots 
of  each  component  to  check  for  manufacturing  defects.  While  each  image  could  be  extremely 
complicated,  and  hence  nonsparse,  the  ensemble  of  images  will  be  highly  correlated,  since  each 
camera  is  observing  the  same  device  with  minor  (sparse)  variations. 

JSM-3  can  also  be  applied  in  non-distributed  scenarios.  For  example,  it  motivates  the  compres¬ 
sion  of  data  such  as  video,  where  the  innovations  or  differences  between  video  frames  may  be  sparse, 
even  though  a  single  frame  may  not  be  very  sparse.  In  this  case,  JSM-3  suggests  that  we  encode 
each  video  frame  separately  using  CS  and  then  decode  all  frames  of  the  video  sequence  jointly.  This 
has  the  advantage  of  moving  the  bulk  of  the  computational  complexity  to  the  video  decoder.  The 
PRISM  system  proposes  a  similar  scheme  based  on  Wyner-Ziv  distributed  encoding  [56]. 

There  are  many  possible  joint  sparsity  models  beyond  those  introduced  above,  as  well  as  beyond 
the  common  and  innovation  component  signal  model.  Further  work  will  yield  new  JSMs  suitable 
for  other  application  scenarios;  an  example  application  consists  of  multiple  cameras  taking  digital 
photos  of  a  common  scene  from  various  angles  [57].  Extensions  are  discussed  in  Section  6. 

4  Theoretical  Bounds  on  Measurement  Rates 

In  this  section,  we  seek  conditions  on  Xi  =  (Mi,  M2,  ■  ■  ■ ,  Mj),  the  tuple  of  number  of  measurements 
from  each  sensor,  such  that  we  can  guarantee  perfect  recovery  of  X  given  Y.  To  this  end,  we 
provide  a  graphical  model  for  the  general  framework  provided  in  Section  3.2.  This  graphical  model 
is  fundamental  in  the  derivation  of  the  number  of  measurements  needed  for  each  sensor,  as  well 
as  in  the  formulation  of  a  combinatorial  recovery  procedure.  Thus,  we  generalize  Theorem  1  to 
the  distributed  setting  to  obtain  fundamental  limits  on  the  number  of  measurements  that  enable 
recovery  of  sparse  signal  ensembles. 

Based  on  the  models  presented  in  Section  3,  recovering  X  requires  determining  a  value  vector  0 
and  location  matrix  P  such  that  X  =  PQ.  Two  challenges  immediately  present  themselves.  First, 
a  given  measurement  depends  only  on  some  of  the  components  of  0,  and  the  measurement  budget 
should  be  adjusted  between  the  sensors  according  to  the  information  that  can  be  gathered  on  the 
components  of  0.  For  example,  if  a  component  0(d)  does  not  affect  any  signal  coefficient  xj(-)  in 
sensor  j,  then  the  corresponding  measurements  yj  provide  no  information  about  ©(d).  Second,  the 
decoder  must  identify  a  location  matrix  P  e  Vp(X)  from  the  set  V  and  the  measurements  Y . 

4.1  Modeling  dependencies  using  bipartite  graphs 

We  introduce  a  graphical  representation  that  captures  the  dependencies  between  the  measurements 
in  Y  and  the  value  vector  0,  represented  by  $  and  P.  Consider  a  feasible  decomposition  of  X 
into  a  full-rank  matrix  P  €  Vp(X)  and  the  corresponding  0;  the  matrix  P  defines  the  sparsities 
of  the  common  and  innovation  components  Kq  and  Kj,  1  <  j  <  J,  as  well  as  the  joint  sparsity 
D  =  Kc  +  Kj.  Define  the  following  sets  of  vertices:  (i)  the  set  of  value  vertices  Vy  has 

elements  with  indices  d  €  {1, . . .  ,D}  representing  the  entries  of  the  value  vector  0(d),  and  (ii)  the 
set  of  measurement  vertices  Vm  has  elements  with  indices  (j,  m)  representing  the  measurements 
yj(m),  with  j  £  A  and  m  6  {1, . . . ,  Mj}.  The  cardinalities  for  these  sets  are  | Vy  =  D  and 
\VM\  =  M,  respectively. 

We  now  introduce  a  bipartite  graph  G  =  (Vy,  Vm,  E),  that  represents  the  relationships  between 
the  entries  of  the  value  vector  and  the  measurements  (see  [4]  for  details).  The  set  of  edges  E  is 
defined  as  follows: 
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Figure  1:  Bipartite  graph  for  distributed  compressive  sensing  (DCS).  The  bipartite  graph  G  =  (Vy,  Vm,E ) 
indicates  the  relationship  between  the  value  vector  coefficients  and  the  measurements. 

•  For  every  d  £  { 1,2, ... ,  Kc}  C  Vy  and  j  £  A  such  that  column  d  of  Pc  does  not  also  appear 
as  a  column  of  Pj ,  we  have  an  edge  connecting  d  to  each  vertex  (j,  m)  £  Vm  for  1  <  m  <  Mj . 

•  For  every  d  £  {Kc  +  F  Kc  +  2, . . . ,  D}  C  Vy,  we  consider  the  sensor  j  associated  with  column 
d  of  P,  and  we  have  an  edge  connecting  d  to  each  vertex  (j,  m)  £  Vm  for  1  <  m  <  Mj. 

In  words,  we  say  that  yj(m ),  the  mth  measurement  of  sensor  j .  measures  0(d)  if  the  vertex  d  £  Vy 
is  linked  to  the  vertex  (j,  m)  £  Vm  in  the  graph  G.  An  example  graph  for  a  distributed  sensing 
setting  is  shown  in  Figure  1. 

4.2  Quantifying  redundancies 

In  order  to  obtain  sharp  bounds  on  the  number  of  measurements  needed,  our  analysis  of  the  mea¬ 
surement  process  must  account  for  redundancies  between  the  locations  of  the  nonzero  coefficients  in 
the  common  and  innovation  components.  To  that  end,  we  consider  the  overlaps  between  common 
and  innovation  components  in  each  signal.  When  we  have  zc(n)  ^  0  and  Zj(n)  ^  0  for  a  certain 
signal  j  and  some  index  1  <  n  <  N,  we  cannot  recover  the  values  of  both  coefficients  from  the 
measurements  of  this  signal  alone;  therefore,  we  will  need  to  recover  zc(ri)  using  measurements  of 
other  signals  that  do  not  feature  the  same  overlap.  We  thus  quantify  the  size  of  the  overlap  for 
all  subsets  of  signals  T  C  A  under  a  feasible  representation  given  by  P  and  0,  as  described  in 
Section  3.2. 

Definition  2  The  overlap  size  for  the  set  of  signals  F  c  A,  denoted  Kc(T,P),  is  the  number  of 
indices  in  which  there  is  overlap  between  the  common  and  the  innovation  component  supports  at  all 
signals  j 


Kc{T,P)  =  \{n£  {1,...,N}  :  zc{n)  /  0  and  V  j  £  T  ,Zj(n)  /  0}|  .  (6) 

We  also  define  Kc(A-,P)  =  Kc{P)  and  Kc(fb,P)  =  0. 

For  r  C  A,  Kc{ r,  P)  provides  a  penalty  term  due  to  the  need  for  recovery  of  common  component 
coefficients  that  are  overlapped  by  innovations  in  all  other  signals  j  ^  T.  Intuitively,  for  each  entry 
counted  in  Kc( T,  P ),  some  sensor  in  F  must  take  one  measurement  to  account  for  that  entry  of  the 
common  component  —  it  is  impossible  to  recover  such  entries  from  measurements  made  by  sensors 
outside  of  T.  When  all  signals  j  £  A  are  considered,  it  is  clear  that  all  of  the  common  component 
coefficients  must  be  recovered  from  the  obtained  measurements. 
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4.3  Measurement  bounds 


Converse  and  achievable  bounds  for  the  number  of  measurements  necessary  for  DCS  recovery  are 
given  below.  Our  bounds  consider  each  subset  of  sensors  T  C  A,  since  the  cost  of  sensing  the 
common  component  can  be  amortized  across  sensors:  it  may  be  possible  to  reduce  the  rate  at  one 
sensor  j\  £  T  (up  to  a  point),  as  long  as  other  sensors  in  T  offset  the  rate  reduction.  We  quantify 
the  reduction  possible  through  the  following  definition. 


Definition  3  The  conditional  sparsity  of  the  set  of  signals  T  is  the  number  of  entries  of  the  vector 
0  that  must  be  recovered  by  measurements  yj,  j  £  T: 


-Kcond(r,.P) 


jer  ) 


+  KC(T,P). 


The  joint  sparsity  gives  the  number  of  degrees  of  freedom  for  the  signals  in  A,  while  the  conditional 
sparsity  gives  the  number  of  degrees  of  freedom  for  signals  in  T  when  the  signals  in  A  \  T  are 
available  as  side  information.  Note  also  that  Definition  1  for  joint  sparsity  can  be  extended  to  a 
subset  of  signals  T  by  considering  the  number  of  entries  of  0  that  affect  these  signals: 


Aj„int(r, p)  =  d-  k,, . (A  - r, P) 


E'M-P}) 

jer  / 


+  KC(P)  -  KC(A\F,P). 


Note  that  Kcond(A,P)  =  K)oirit(A,  P)  =  D. 

The  bipartite  graph  introduced  in  Section  4.1  is  the  cornerstone  of  Theorems  3,  4,  and  5,  which 
consider  whether  a  perfect  matching  can  be  found  in  the  graph;  see  the  proofs  in  Appendices  B,  D, 
and  E,  respectively,  for  detail. 


Theorem  3  (Achievable,  known  P)  Assume  that  a  signal  ensemble  X  is  obtained  from  a  com¬ 
mon/innovation  component  JSM  V.  Let  A4  =  (M\,  M2,  ■  ■  ■ ,  Mj)  be  a  measurement  tuple,  let 
be  random  matrices  having  Mj  rows  of  i.i.d.  Gaussian  entries  for  each  j  £  A,  and  write 
Y  =  &X .  Suppose  there  exists  a  full  rank  location  matrix  P  £  Vf{X)  such  that 

>  Kcond(T.  P)  (7) 

ier 

for  all  T  C  A.  Then  with  probability  one  over  {<3?j}jGr>  there  exists  a  unique  solution  0  to  the  system 
of  equations  Y  =  4>P0;  hence,  the  signal  ensemble  X  can  be  uniquely  recovered  as  X  =  PQ. 


Theorem  4  (Achievable,  unknown  P)  Assume  that  a  signal  ensemble  X  and  measurement  matri¬ 
ces  {‘hjljgA  follow  the  assumptions  of  Theorem  3.  Suppose  there  exists  a  full  rank  location  matrix 
P*  £  Vf(X)  such  that 

^  Keener,  P*)  +  \T\  (8) 

ier 

for  all  T  C  A.  Then  X  can  be  uniquely  recovered  from  Y  with  probability  one  over  {4>:,-}jgr- 


Theorem  5  (Converse)  Assume  that  a  signal  ensemble  X  and  measurement  matrices  {<f>j}jeA 
follow  the  assumptions  of  Theorem  3.  Suppose  there  exists  a  full  rank  location  matrix  P  £  Vf{X) 
such  that 

Y,m3  <Kcond(r,P)  (9) 

jer 
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for  some  TCA.  Then  there  exists  a  solution  0  such  that  Y  =  <i>P0  bid  X  :=  PQ  ^  X . 

The  identification  of  a  feasible  location  matrix  P  causes  the  one  measurement  per  sensor  gap  that 
prevents  (8)-(9)  from  being  a  tight  converse  and  achievable  bound  pair.  We  note  in  passing  that  the 
signal  recovery  procedure  used  in  Theorem  4  is  akin  to  to-norm  minimization  on  X ;  see  Appendix  D 
for  details. 

4.4  Discussion 

The  bounds  in  Theorems  3-5  are  dependent  on  the  dimensionality  of  the  subspaces  in  which  the 
signals  reside.  The  number  of  noiseless  measurements  required  for  ensemble  recovery  is  determined 
by  the  dimensionality  dim(5)  of  the  subspace  S  in  the  relevant  signal  model,  because  dimensionality 
and  sparsity  play  a  volumetric  role  akin  to  the  entropy  H  used  to  characterize  rates  in  source 
coding.  Whereas  in  source  coding  each  bit  resolves  between  two  options,  and  2NH  typical  inputs 
are  described  using  NH  bits  [12],  in  CS  we  have  M  =  dim(5)  +  0(1).  Similar  to  Slepian-Wolf 
coding  [13],  the  number  of  measurements  required  for  each  sensor  must  account  for  the  minimal 
features  unique  to  that  sensor,  while  at  the  same  time  features  that  appear  among  multiple  sensors 
must  be  amortized  over  the  group. 

Theorems  3-5  can  also  be  applied  to  the  single  sensor  and  joint  measurement  settings.  In  the 
single-signal  setting  (Theorem  1),  we  will  have  x  =  P6  with  0  £  RA,  and  A  =  {1};  Theorem  4 
provides  the  requirement  M  >  K  +  1.  It  is  easy  to  show  that  the  joint  measurement  is  equivalent 
to  the  single-signal  setting:  we  stack  all  the  individual  signals  into  a  single  signal  vector,  and  in 
both  cases  all  measurements  are  dependent  on  all  the  entries  of  the  signal  vector.  However,  the 
distribution  of  the  measurements  among  the  available  sensors  is  irrelevant  in  a  joint  measurement 
setting.  Therefore,  we  only  obtain  a  necessary  condition  ]>T  Mj  >  D  +  1  on  the  total  number  of 
measurements  required. 

5  Practical  Recovery  Algorithms  and  Experiments 

Although  we  have  provided  a  unifying  theoretical  treatment  for  the  three  JSM  models,  the  nuances 
warrant  further  study.  In  particular,  while  Theorem  4  highlights  the  basic  tradeoffs  that  must 
be  made  in  partitioning  the  measurement  budget  among  sensors,  the  result  does  not  by  design 
provide  insight  into  tractable  algorithms  for  signal  recovery.  We  believe  there  is  additional  insight 
to  be  gained  by  considering  each  model  in  turn,  and  while  the  presentation  may  be  less  unified,  we 
attribute  this  to  the  fundamental  diversity  of  problems  that  can  arise  under  the  umbrella  of  jointly 
sparse  signal  representations.  In  this  section,  we  focus  on  tractable  recovery  algorithms  for  each 
model  and,  when  possible,  analyze  the  corresponding  measurement  requirements. 

5.1  Recovery  strategies  for  sparse  common  +  innovations  (JSM-1) 

We  first  characterize  the  sparse  common  signal  and  innovations  model  JSM-1  from  Section  3.3.1. 
For  simplicity,  we  limit  our  description  to  J  =  2  signals,  but  describe  extensions  to  multiple  signals 
as  needed. 

5.1.1  Measurement  bounds  for  joint  recovery 

Under  the  JSM-1  model,  separate  recovery  of  the  signal  Xj  via  £o-norm  minimization  would  require 
joint  ({j  } )  +  1  =  A-cond  ({j  } )  +  1  =  KC  +  Kj  ~  Kc(  A  \  { j })  +  1  measurements,  where  Kc{  A  \  {j}) 
accounts  for  sparsity  reduction  due  to  overlap  between  zc  and  Zj.  We  apply  Theorem  4  to  the  JSM- 
1  model  to  obtain  the  corollary  below.  To  address  the  possibility  of  sparsity  reduction,  we  denote 
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by  Kr  the  number  of  indices  in  which  the  common  component  zc  and  all  innovation  components 
Zj,  j  £  A  overlap;  this  results  in  sparsity  reduction  for  the  common  component. 

Corollary  1  Assume  the  measurement  matrices  contain  i.i.d.  Gaussian  entries.  Then 

the  signal  ensemble  X  can  be  recovered  with  probability  one  if  the  following  conditions  hold: 


jer 


+  KC(T)  +  \T 


IV  A, 


E  M> 
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Kc  + 


+  J  -  Kr. 


Our  joint  recovery  scheme  provides  a  significant  savings  in  measurements,  because  the  common 
component  can  be  measured  as  part  of  any  of  the  J  signals. 


5.1.2  Stochastic  signal  model  for  JSM-1 

To  give  ourselves  a  firm  footing  for  analysis,  in  the  remainder  of  Section  5.1  we  use  a  stochastic 
process  for  JSM-1  signal  generation.  This  framework  provides  an  information  theoretic  setting 
where  we  can  scale  the  size  of  the  problem  and  investigate  which  measurement  rates  enable  recovery. 
We  generate  the  common  and  innovation  components  as  follows.  For  n  £  {1, . . . ,  N}  the  decision 
whether  zc{n)  and  Zj(n )  is  zero  or  not  is  an  i.i.d.  Bernoulli  process,  where  the  probability  of  a 
nonzero  value  is  given  by  parameters  denoted  Sc  and  Sj,  respectively.  The  values  of  the  nonzero 
coefficients  are  then  generated  from  an  i.i.d.  Gaussian  distribution.  The  outcome  of  this  process  is 
that  zc  and  Zj  have  sparsities  Kc  ~  Binomial(Af,  Sc)  and  Kj  ~  Binomial (IV,  Sj).  The  parameters 
Sj  and  Sc  are  sparsity  rates  controlling  the  random  generation  of  each  signal.  Our  model  resembles 
the  Gaussian  spike  process  [58],  which  is  a  limiting  case  of  a  Gaussian  mixture  model. 

Likelihood  of  sparsity  reduction  and  overlap:  This  stochastic  model  can  yield  signal 
ensembles  for  which  the  corresponding  generating  matrices  P  allow  for  sparsity  reduction;  specif¬ 
ically,  there  might  be  overlap  between  the  supports  of  the  common  component  zc  and  all  the 
innovation  components  Zj,  j  £  A.  For  J  =  2,  the  probability  that  a  given  index  is  present  in 
all  supports  is  Sr  :=  ScSiS2.  Therefore,  the  distribution  of  the  cardinality  of  this  overlap  is 
Kr  Binomial ( N,  Sr).  We  must  account  for  the  reduction  obtained  from  the  removal  of  the 
corresponding  number  of  columns  from  the  location  matrix  P  when  the  total  number  of  mea¬ 
surements  Mi  +  M2  is  considered.  In  the  same  way  we  can  show  that  the  distributions  for  the 
number  of  indices  in  the  overlaps  required  by  Corollary  1  are  Kc{{P\)  ~  Binomial(Af,  Sc  /u)  and 
Kc({ 2})  ~  Binomial (iV,  Sc, {2}),  where  Sc,{  1}  :=  Sc(l  -  Si)S2  and  S'c,{2}  :=  ScSi(l  -  S'2). 

Measurement  rate  region:  To  characterize  DCS  recovery  performance,  we  introduce  a 
measurement  rate  region.  We  define  the  measurement  rate  Rj  in  an  asymptotic  manner  as 


„  ,  An 

Rj  :=  lim  —f,  j  £  A. 

3  tv^oo  N 


Additionally,  we  note  that 


lim  -V  =  Sc  and  lim  —f  =  Sj,  j  £  A 
N^oo  N  00  N  3 


Thus,  we  also  set  Sxj  =  Sc  +  Sj  —  ScSj,  j  £  {1,2}.  For  a  measurement  rate  pair  (R\,R2)  and 
sources  X\  and  X2,  we  evaluate  whether  we  can  recover  the  signals  with  vanishing  probability  of 
error  as  N  increases.  In  this  case,  we  say  that  the  measurement  rate  pair  is  achievable. 
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For  jointly  sparse  signals  under  JSM-1,  separate  recovery  via  fo-norm  minimization  would 
require  a  measurement  rate  Rj  =  Sx:i  ■  Separate  recovery  via  t 1  -norm  minimization  would  require 
an  overmeasuring  factor  c(Sxj),  and  thus  the  measurement  rate  would  become  Rj  =  Sxj  ■c(Sxj)-  To 
improve  upon  these  figures,  we  adapt  the  standard  machinery  of  CS  to  the  joint  recovery  problem. 

5.1.3  Joint  recovery  via  4-norm  minimization 

As  discussed  in  Section  2.3,  solving  an  to-norm  minimization  is  NP-hard,  and  so  in  practice  we 
must  relax  our  £q  criterion  in  order  to  make  the  solution  tractable.  We  now  study  what  penalty 
must  be  paid  for  tj-norm  recovery  of  jointly  sparse  signals.  Using  the  vector  and  frame 


1 

1 _ 

and  4>  :  = 

'  Ti 

Ti 

0 

4>2 

0 

4*2 

L  ^2  J 

(11) 


we  can  represent  the  concatenated  measurement  vector  Y  sparsely  using  the  concatenated  coefficient 
vector  Z .  which  contains  Kc+K\-\-K2—Kr  nonzero  coefficients,  to  obtain  Y  =  4>Z.  With  sufficient 
over  measuring,  we  have  seen  experimentally  that  it  is  possible  to  recover  a  vector  Z,  which  yields 
Xj  =  zq  +  Zj,  j  =  1,2,  by  solving  the  weighted  fi-norm  minimization 

Z  =  argminycll^clli  +  7i|M|i  +  72IMI1  s.t.  y  =  4>Z,  (12) 

where  7cs7i>72  >  0.  We  call  this  the  7 -weighted  £\-norm  formulation ;  our  numerical  results 
(Section  5.1.6  and  our  technical  report  [59])  indicate  a  reduction  in  the  requisite  number  of  mea¬ 
surements  via  this  enhancement.  If  K\  =  K2  and  M\  =  M2,  then  without  loss  of  generality  we  set 
71  =  72  =  1  and  numerically  search  for  the  best  parameter  7 c-  We  discuss  the  asymmetric  case 
with  Ki  =  A 2  and  Mi  7^  M2  in  the  technical  report  [59]. 

5.1.4  Converse  bound  on  performance  of  7-weighted  fi-norm  minimization 

We  now  provide  a  converse  bound  that  describes  what  measurement  rate  pairs  cannot  be  achieved 
via  the  7-weighted  1 1  -norm  minimization.  Our  notion  of  a  converse  focuses  on  the  setting  where 
each  signal  xj  is  measured  via  multiplication  by  the  Mj  by  N  matrix  4>;  and  joint  recovery  is 
performed  via  our  7-weighted  1 1  -norm  formulation  (12).  Within  this  setting,  a  converse  region  is  a 
set  of  measurement  rates  for  which  the  recovery  fails  with  overwhelming  probability  as  N  increases. 

We  assume  that  J  =  2  sources  have  innovation  sparsity  rates  that  satisfy  S\  =  S2  =  Sj. 
Our  first  result,  proved  in  Appendix  F,  provides  deterministic  necessary  conditions  to  recover  the 
components  zc,  z\,  and  Z2,  using  the  7-weighted  t \ -norm  formulation  (12).  We  note  that  the  lemma 
holds  for  all  such  combinations  of  components  that  generate  the  same  signals  x\  =  zc  +  z\  and 
X2  =  zc  +  22- 


Lemma  1  Consider  any  7 c,  71,  and  72  in  the  'y-weighted  £\-norm  formulation  (12).  The  com¬ 
ponents  zc,  Z\ ,  and  Z2  can  be  recovered  using  measurement  matrices  4q  and  d>2  only  if  (i)  z\  can 
be  recovered  via  l\-norm  minimization  (3)  using  <l>i  and  measurements  <&\z\;  (ii)  Z2  can  be  recov¬ 
ered  via  Hi-norm  minimization  using  d>2  and  measurements  <£222;  and  (in)  zc  can  be  recovered  via 
li-norm  minimization  using  the  joint  matrix  [<F^  3*2  P  anc^  measurements  [d>[(  4*2  ]Tzc- 

Lemma  1  can  be  interpreted  as  follows.  If  M\  and  M2  are  not  large  enough  individually,  then 
the  innovation  components  z\  and  Z2  cannot  be  recovered.  This  implies  a  converse  bound  on  the 
individual  measurement  rates  R\  and  f?2-  Similarly,  combining  Lemma  1  with  the  converse  bound 
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Figure  2:  Recovering  a  signal  ensemble  with  sparse  common  +  innovations  (JSM-1 ).  We  chose  a 
common  component  sparsity  rate  Sc  =  0.2  and  innovation  sparsity  rates  Sj  =  S\  =  S2  =  0.05.  Our 
simulation  results  use  the  y-weighted  £\-norm  formulation  (12)  on  signals  of  length  N  =  1000;  the 
measurement  rate  pairs  that  achieved  perfect  recovery  over  100  simulations  are  denoted  by  circles. 


of  Theorem  2  for  single-source  -norm  minimization  of  the  common  component  zq  implies  a  lower 
bound  on  the  sum  measurement  rate  R.\  +  i?2- 

Anticipated  converse:  As  shown  in  Corollary  1,  for  indices  n  such  that  x\(n)  and  X2 (n)  differ 
and  are  nonzero,  each  sensor  must  take  measurements  to  account  for  one  of  the  two  coefficients.  In 
the  case  where  S\  =  S2  =  Si,  the  joint  sparsity  rate  is  Sc  +  2Si~  ScSj .  We  define  the  measurement 
function  c'(S)  :=  S  ■  c(S )  based  on  Donoho  and  Tanner’s  oversampling  factor  c(S)  (Theorem  2). 
It  can  be  shown  that  the  function  c'(-)  is  concave;  in  order  to  minimize  the  sum  rate  bound,  we 
“explain”  as  many  of  the  sparse  coefficients  in  one  of  the  signals  and  as  few  as  possible  in  the  other. 
From  Corollary  1,  we  have  R.\ ,  R.2  >  Si  +  Sc  Si  —  ScSj.  Consequently,  one  of  the  signals  must 
“explain”  this  sparsity  rate,  whereas  the  other  signal  must  explain  the  rest: 

[Sc  +  2 S,  -  ScSj }  -  [5/  +  ScSi  -  ScSj }  =  Sc  +  S,  -  ScSr. 

Unfortunately,  the  derivation  of  c'(S )  relies  on  Gaussianity  of  the  measurement  matrix,  whereas 
in  our  case  T  has  a  block  matrix  form.  Therefore,  the  following  conjecture  remains  to  be  proved 
rigorously. 

Conjecture  1  Let  J  =  2  and  fix  the  sparsity  rate  of  the  common  component  Sc  and  the  innovation 
sparsity  rates  S\  =  S2  =  Sj.  Then  the  following  conditions  on  the  measurement  rates  are  necessary 
to  enable  recovery  with  probability  one: 

Rj  >  c'  (Si  +  ScSi  -  ScSj )  ,  j  =  1, 2, 

R1  +  R2  >  c1  (Si  +  ScSi  -  ScSj) +c' (Sc  +  Si -ScSi). 


5.1.5  Achievable  bound  on  performance  of  fj-norm  minimization 

We  have  not  yet  characterized  the  performance  of  7-weighted  Id-norm  formulation  (12)  analytically. 
Instead,  Theorem  6  below  uses  an  alternative  Id-norm  based  recovery  technique.  The  proof  describes 
a  constructive  recovery  algorithm.  We  construct  measurement  matrices  and  $2j  each  consisting 
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of  two  parts.  The  first  parts  of  the  matrices  are  identical  and  recover  x\  —  X2-  The  second  parts  of 
the  matrices  are  different  and  enable  the  recovery  of  \x\  +  ^ X2 ■  Once  these  two  components  have 
been  recovered,  the  computation  of  x\  and  x'2  is  straightforward.  The  measurement  rate  can  be 
computed  by  considering  both  identical  and  different  parts  of  the  measurement  matrices. 

Theorem  6  Let  J  =  2,  N  — >  oo  and  fix  the  sparsity  rate  of  the  common  component  Sc  and  the 

innovation  sparsity  rates  Si  =  S2  =  Sj .  If  the  measurement  rates  satisfy  the  following  conditions: 

Rj  >  c'{2SI  -  Sj),  j  =  1,2,  (13a) 

R1  +  R2  >  c'(2SI-S'j)  +  c,(Sc  +  2SI-2ScSI-S'j  +  ScS'j),  (13b) 

then  we  can  design  measurement  matrices  and  $2  with  random  Gaussian  entries  and  an  i\-norm 
minimization  recovery  algorithm  that  succeeds  with  probability  approaching  one  as  N  increases. 
Furthermore,  as  Si  0  the  sum  measurement  rate  approaches  d (Sc)- 

The  theorem  is  proved  in  Appendix  G.  The  recovery  algorithm  of  Theorem  6  is  based  on  linear 
programming.  It  can  be  extended  from  J  =  2  to  an  arbitrary  number  of  signals  by  recovering  all 
signal  differences  of  the  form  xn  —Xj2  in  the  first  stage  of  the  algorithm  and  then  recovering  -j  )TF  Xj 
in  the  second  stage.  In  contrast,  our  7-weighted  1 1  -norm  formulation  (12)  recovers  a  length-JIV 
signal.  Our  simulation  experiments  (Section  5.1.6)  indicate  that  the  7-weighted  formulation  can 
recover  using  fewer  measurements  than  the  approach  of  Theorem  6. 

The  achievable  measurement  rate  region  of  Theorem  6  is  loose  with  respect  to  the  region  of  the 
anticipated  converse  Conjecture  1  (see  Figure  2).  We  leave  for  future  work  the  characterization  of  a 
tight  measurement  rate  region  for  computationally  tractable  (polynomial  time)  recovery  techniques. 

5.1.6  Simulations  for  JSM-1 

We  now  present  simulation  results  for  several  different  JSM-1  settings.  The  7-weighted  G-norm 
formulation  (12)  was  used  throughout,  where  the  optimal  choice  of  7 q,  71,  and  72  depends  on  the 
relative  sparsities  Kc,  K\ ,  and  K2.  The  optimal  values  have  not  been  determined  analytically.  In¬ 
stead,  we  rely  on  a  numerical  optimization,  which  is  computationally  intense.  A  detailed  discussion 
of  our  intuition  behind  the  choice  of  7  appears  in  the  technical  report  [59]. 

Recovering  two  signals  with  symmetric  measurement  rates:  Our  simulation  setting  is 
as  follows.  The  signal  components  zc ,  z\,  and  Z2  are  assumed  (without  loss  of  generality)  to  be 
sparse  in  T  =  In  with  sparsities  Kc,  K\ ,  and  K2,  respectively.  We  assign  random  Gaussian  values 
to  the  nonzero  coefficients.  We  restrict  our  attention  to  the  symmetric  setting  in  which  K\  =  K2 
and  M\  =  M2,  and  consider  signals  of  length  N  =  50  where  Kc  +  K\  +  K2  =  15. 

In  our  joint  decoding  simulations,  we  consider  values  of  Mi  and  M2  in  the  range  between  10 
and  40.  We  find  the  optimal  7 c  in  the  7-weighted  I \ -norm  formulation  (12)  using  a  line  search 
optimization,  where  simulation  indicates  the  “goodness”  of  specific  7 c  values  in  terms  of  the  like¬ 
lihood  of  recovery.  With  the  optimal  7 c,  for  each  set  of  values  we  run  several  thousand  trials  to 
determine  the  empirical  probability  of  success  in  decoding  zi  and  Z2 ■  The  results  of  the  simulation 
are  summarized  in  Figure  3.  The  savings  in  the  number  of  measurements  M  can  be  substantial, 
especially  when  the  common  component  Kc  is  large  (Figure  3).  For  Kc  =  11,  K\  =  K2  =  2, 
M  is  reduced  by  approximately  30%.  For  smaller  I\c,  joint  decoding  barely  outperforms  separate 
decoding,  since  most  of  the  measurements  are  expended  on  innovation  components.  Additional 
results  appear  in  [59]. 

Recovering  two  signals  with  asymmetric  measurement  rates:  In  Figure  2,  we  compare 
separate  CS  recovery  with  the  anticipated  converse  bound  of  Conjecture  1,  the  achievable  bound 
of  Theorem  6,  and  numerical  results. 
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K  =  1 1 ,  K  =  K  =  2,  N  =  50,  Y„  =  0.905  K  =  3,  K  =  K  =  6,  N  =  50,  =  1 .425 

L>  I  d.  L>  c  L»  \  d.  L> 


Number  of  Measurements  per  Signal,  M  Number  of  Measurements  per  Signal,  M 

Figure  3:  Comparison  of  joint  decoding  and  separate  decoding  for  JSM-1.  The  advantage  of  joint 
over  separate  decoding  depends  on  the  common  component  sparsity. 


Figure  4:  Multi-sensor  measurement  results  for  JSM-1.  We  choose  a  common  component  sparsity 
rate  Sc  =  0.2,  innovation  sparsity  rates  Sj  =  0.05,  and  signals  of  length  N  =  500;  our  results 
demonstrate  a  reduction  in  the  measurement  rate  per  sensor  as  the  number  of  sensors  J  increases. 


We  use  J  =  2  signals  and  choose  a  common  component  sparsity  rate  Sc  =  0.2  and  innovation 
sparsity  rates  Sj  =  S±  =  S2  =  0.05.  We  consider  several  different  asymmetric  measurement  rates. 
In  each  such  setting,  we  constrain  M2  to  have  the  form  M2  =  otM\  for  some  a ,  with  N  =  1000. 
The  results  plotted  indicate  the  smallest  pairs  (Mi,  M2)  for  which  we  always  succeeded  recovering 
the  signal  over  100  simulation  runs.  In  some  areas  of  the  measurement  rate  region  our  7-weighted 
1 1  -norm  formulation  (12)  requires  fewer  measurements  than  the  achievable  approach  of  Theorem  6. 

Recovering  multiple  signals  with  symmetric  measurement  rates:  The  7-weighted  i\- 
norrn  recovery  technique  of  this  section  is  especially  promising  when  J  >  2  sensors  are  used.  These 
savings  may  be  valuable  in  applications  such  as  sensor  networks,  where  data  may  contain  strong 
spatial  (inter-source)  correlations. 

We  use  J  G  { 1,2,...,  10}  signals  and  choose  the  same  sparsity  rates  Sc  =  0.2  and  Sj  =  0.05 
as  the  asymmetric  rate  simulations;  here  we  use  symmetric  measurement  rates  and  let  N  =  500. 
The  results  of  Figure  4  describe  the  smallest  symmetric  measurement  rates  for  which  we  always 
succeeded  recovering  the  signal  over  100  simulation  runs.  As  J  increases,  lower  measurement  rates 
can  be  used;  the  results  compare  favorably  with  the  lower  bound  from  Conjecture  1,  which  gives 
Rj  ~  0.232  as  J  — >  00. 
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5.2  Recovery  strategies  for  common  sparse  supports  (JSM-2) 

Under  the  JSM-2  signal  ensemble  model  from  Section  3.3.2,  separate  recovery  of  each  signal  via 
fo-nornr  minimization  would  require  K  +  1  measurements  per  signal,  while  separate  recovery  via 
tj-norm  minimization  would  require  cK  measurements  per  signal.  When  Theorems  4  and  5  are 
applied  in  the  context  of  JSM-2,  the  bounds  for  joint  recovery  match  those  of  individual  recovery 
using  to-norm  minimization.  Within  this  context,  it  is  also  possible  to  recover  one  of  the  signals 
using  K  +  1  measurements  from  the  corresponding  sensor,  and  then  with  the  prior  knowledge  of 
the  support  set  S2,  recover  all  other  signals  from  I\  measurements  per  sensor;  thus  providing  an 
additional  savings  of  J  —  1  measurements  [60].  Surprisingly,  we  will  demonstrate  below  that  for 
large  J,  the  common  support  set  can  actually  be  recovered  using  only  one  measurement  per  sensor 
and  algorithms  that  are  computationally  tractable. 

The  algorithms  we  propose  are  inspired  by  conventional  greedy  pursuit  algorithms  for  CS 
(such  as  OMP  [26]).  In  the  single-signal  case,  OMP  iteratively  constructs  the  sparse  support  set  fl; 
decisions  are  based  on  inner  products  between  the  columns  of  $  and  a  residual.  In  the  multi-signal 
case,  there  are  more  clues  available  for  determining  the  elements  of  O. 

5.2.1  Recovery  via  Trivial  Pursuit  (TP) 

When  there  are  many  correlated  signals  in  the  ensemble,  a  simple  non-iterative  greedy  algorithm 
based  on  inner  products  will  suffice  to  recover  the  signals  jointly.  For  simplicity  but  without  loss 
of  generality,  we  assume  that  an  equal  number  of  measurements  Mj  =  M  are  taken  of  each  signal. 
We  write  in  terms  of  its  columns,  with  J?.,  =  [4>jti,  4>j, 2,  ■  ■  ■ ,  4>j,N\- 

Trivial  Pursuit  (TP)  Algorithm  for  JSM-2 

1.  Get  greedy:  Given  all  of  the  measurements,  compute  the  test  statistics 

1  J 

?«=  jEfe’W2’  ne{l,2  (14) 

3  = 1 

and  estimate  the  elements  of  the  common  coefficient  support  set  by 

Q  =  {n  having  one  of  the  K  largest 


When  the  sparse,  nonzero  coefficients  are  sufficiently  generic  (as  defined  below),  we  have  the 
following  surprising  result,  which  is  proved  in  Appendix  H. 

Theorem  7  Let  \k  be  an  orthonormal  basis  for  MN ,  let  the  measurement  matrices  contain  i.i.d. 
Gaussian  entries,  and  assume  that  the  nonzero  coefficients  in  the  9j  are  i.i.d.  Gaussian  random 
variables.  Then  with  M  >  1  measurements  per  signal,  TP  recovers  Q  with  probability  approaching 
one  as  J  — >  oo. 

In  words,  with  fewer  than  K  measurements  per  sensor,  it  is  actually  possible  to  recover  the 
sparse  support  set  fl  under  the  JSM-2  model.5  Of  course,  this  approach  does  not  recover  the  K 
coefficient  values  for  each  signal;  at  least  K  measurements  per  sensor  are  required  for  this. 

Corollary  2  Assume  that  the  nonzero  coefficients  in  the  Qj  are  i.i.d.  Gaussian  random  variables. 
Then  the  following  statements  hold: 

5  One  can  also  show  the  somewhat  stronger  result  that,  as  long  as  ff  -  Mj  N,  TP  recovers  ft  with  probability 
approaching  one.  We  have  omitted  this  additional  result  for  brevity. 
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1.  Let  the  measurement  matrices  contain  i.i.d.  Gaussian  entries,  with  each  matrix  having  an 
overmeasuring  factor  of  c  =  1  (that  is,  Mj  =  K  for  each  measurement  matrix  $j).  Then  TP 
recovers  all  signals  from  the  ensemble  {xj}  with  probability  approaching  one  as  J  — >  oo. 

2.  Let  be  a  measurement  matrix  with  overmeasuring  factor  c  <  1  (that  is,  Mj  <  K),  for 
some  j  G  A.  Then  with  probability  one,  the  signal  Xj  cannot  be  uniquely  recovered  by  any 
algorithm  for  any  value  of  J. 

The  first  statement  is  an  immediate  corollary  of  Theorem  7;  the  second  statement  follows 
because  each  equation  yj  =  &jXj  would  be  underdetermined  even  if  the  nonzero  indices  were 
known.  Thus,  under  the  JSM-2  model,  the  TP  algorithm  asymptotically  performs  as  well  as 
an  oracle  decoder  that  has  prior  knowledge  of  the  locations  of  the  sparse  coefficients.  From  an 
information  theoretic  perspective,  Corollary  2  provides  tight  achievable  and  converse  bounds  for 
JSM-2  signals.  We  should  note  that  the  theorems  in  this  section  have  a  slightly  different  flavor 
than  Theorem  4  and  5,  which  ensure  recovery  of  any  sparse  signal  ensemble,  given  a  suitable  set 
of  measurement  matrices.  Theorem  7  and  Corollary  2  above,  in  contrast,  rely  on  a  random  signal 
model  and  do  not  guarantee  simultaneous  performance  for  all  sparse  signals  under  any  particular 
measurement  ensemble.  Nonetheless,  we  feel  this  result  is  worth  presenting  to  highlight  the  strong 
subspace  concentration  behavior  that  enables  the  correct  identification  of  the  common  support. 

In  the  technical  reports  [59,61],  we  derive  an  approximate  formula  for  the  probability  of  error 
in  recovering  the  common  support  set  Ll  given  J,  K,  M ,  and  N .  While  theoretically  interesting  and 
potentially  practically  useful,  these  results  require  J  to  be  large.  Our  numerical  experiments  show 
that  the  number  of  measurements  required  for  recovery  using  TP  decreases  quickly  as  J  increases. 
However,  in  the  case  of  small  J,  TP  performs  poorly.  Hence,  we  propose  next  an  alternative 
recovery  technique  based  on  simultaneous  greedy  pursuit  that  performs  well  for  small  J . 

5.2.2  Recovery  via  iterative  greedy  pursuit 

In  practice,  the  common  sparse  support  among  the  J  signals  enables  a  fast  iterative  algorithm 
to  recover  all  of  the  signals  jointly.  Tropp  and  Gilbert  have  proposed  one  such  algorithm,  called 
Simultaneous  Orthogonal  Matching  Pursuit  (SOMP)  [53],  which  can  be  readily  applied  in  our  DCS 
framework.  SOMP  is  a  variant  of  OMP  that  seeks  to  identify  Q  one  element  at  a  time.  A  similar 
simultaneous  sparse  approximation  algorithm  has  been  proposed  using  convex  optimization  [62]. 
We  dub  the  DCS-tailored  SOMP  algorithm  DCS-SOMP. 

To  adapt  the  original  SOMP  algorithm  to  our  setting,  we  first  extend  it  to  cover  a  different 
measurement  matrix  (h;  for  each  signal  Xj.  Then,  in  each  DCS-SOMP  iteration,  we  select  the 
column  index  n  G  {1,2,...,  A7}  that  accounts  for  the  greatest  amount  of  residual  energy  across 
all  signals.  As  in  SOMP,  we  orthogonalize  the  remaining  columns  (in  each  measurement  matrix) 
after  each  step;  after  convergence  we  obtain  an  expansion  of  the  measurement  vector  yj  on  an 
orthogonalized  subset  of  the  columns  of  basis  vectors.  To  obtain  the  expansion  coefficients  in  the 
sparse  basis,  we  then  reverse  the  orthogonalization  process  using  the  QR  matrix  factorization. 
Finally,  we  again  assume  that  Mj  =  M  measurements  per  signal  are  taken. 


DCS-SOMP  Algorithm  for  JSM-2 

1.  Initialize:  Set  the  iteration  counter  £  =  1.  For  each  signal  index  j  G  A,  initialize  the 
orthogonalized  coefficient  vectors  (3j  =  0,  /3j  G  Mm;  also  initialize  the  set  of  selected  indices 
Q  =  0.  Let  Tjn  denote  the  residual  of  the  measurement  yj  remaining  after  the  first  l  iterations, 
and  initialize  rjt o  =  yj. 
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2.  Select  the  dictionary  vector  that  maximizes  the  value  of  the  sum  of  the  magnitudes  of  the 
projections  of  the  residual,  and  add  its  index  to  the  set  of  selected  indices 
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3.  Orthogonalize  the  selected  basis  vector  against  the  orthogonalized  set  of  previously  selected 
dictionary  vectors 

__  ,  ^3inl >  'lj,t) 

4.  Iterate:  Update  the  estimate  of  the  coefficients  for  the  selected  vector  and  residuals 

{rj,£-li  7 j,() 


m  = 


mill 


rj /  —  rj/~  i 


IlvdII 


5.  Check  for  convergence:  If  ||rj^||2  >  e 1 1 i/y 1 1 2  for  all  j,  then  increment  £  and  go  to  Step 
2;  otherwise,  continue  to  Step  6.  The  parameter  e  determines  the  target  error  power  level 
allowed  for  algorithm  convergence. 

6.  De-orthogonalize:  Consider  the  relationship  between  T  j  =  [77 1 , 7j,2;  ■  •  • ,  7j ,m]  and  the 

given  by  the  QR  factorization  (Iv  ^  =  TjRj,  where  ^  =  \4>jinii  4>j,nn  ■  ■  ■  j  ^,nM]  is  the  so- 
called  mutilated  basis.6  Since  yj  =  Tj/3j  =  =  TjRjx-^  where  x-^  is  the  mutilated 

coefficient  vector,  we  can  compute  the  signal  estimates  { Xj }  as 

xj,h  =  Rj  ft 'J ’ 

where  x.  ^  is  the  mutilated  version  of  the  sparse  coefficient  vector  Xj . 

In  practice,  we  obtain  cK  measurements  from  each  signal  Xj  for  some  value  of  c.  We  then  use 
DCS-SOMP  to  recover  the  J  signals  jointly.  We  orthogonalize  because  as  the  number  of  iterations 
approaches  M  the  norms  of  the  residues  of  an  orthogonal  pursuit  decrease  faster  than  for  a  non- 
orthogonal  pursuit;  indeed,  due  to  Step  3  the  algorithm  can  only  run  for  up  to  M  iterations.  The 
computational  complexity  of  this  algorithm  is  0(JNM2),  which  matches  that  of  separate  recovery 
for  each  signal  while  reducing  the  required  number  of  measurements. 

Thanks  to  the  common  sparsity  structure  among  the  signals,  we  believe  (but  have  not  proved) 
that  DCS-SOMP  will  succeed  with  c  <  c(S).  Empirically,  we  have  observed  that  a  small  number 
of  measurements  proportional  to  K  suffices  for  a  moderate  number  of  sensors  J .  Based  on  our 
observations,  described  in  Section  5.2.3,  we  conjecture  that  I\  +  1  measurements  per  sensor  suffice 
as  J  — >  00.  Thus,  this  efficient  greedy  algorithm  enables  an  overmeasuring  factor  c  =  (K  +  1)/K 
that  approaches  1  as  J,  K ,  and  N  increase. 

6 We  define  a  mutilated  basis  't’n  as  a  subset  of  the  basis  vectors  from  <f>  =  [</>i,  (j>2 , ,4>n]  corresponding  to  the 
indices  given  by  the  set  Q  =  {ni,  712, . . . ,  hm},  that  is,  $0  =  [<pni ,  </>„2 , . . . ,  <j>nM]-  This  concept  can  be  extended  to 
vectors  in  the  same  manner. 
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Figure  5:  Recovering  a  signal  ensemble  with  common  sparse  supports  (JSM-2).  We  plot  the  probability  of 
perfect  recovery  via  DCS-SOMP  (solid  lines)  and  separate  CS  recovery  (dashed  lines)  as  a  function  of  the 
number  of  measurements  per  signal  M  and  the  number  of  signals  J.  We  fix  the  signal  length  to  N  =  50, 
the  sparsity  to  K  =  5,  and  average  over  1000  simulation  runs.  An  oracle  encoder  that  knows  the  positions 
of  the  large  signal  expansion  coefficients  would  use  5  measurements  per  signal. 

5.2.3  Simulations  for  JSM-2 

We  now  present  simulations  comparing  separate  CS  recovery  versus  joint  DCS-SOMP  recovery  for 
a  JSM-2  signal  ensemble.  Figure  5  plots  the  probability  of  perfect  recovery  corresponding  to  various 
numbers  of  measurements  M  as  the  number  of  sensors  varies  from  J  =  1  to  32,  over  1000  trials  in 
each  case.  We  fix  the  signal  lengths  at  IV  =  50  and  the  sparsity  of  each  signal  to  I\  =  5. 

With  DCS-SOMP,  for  perfect  recovery  of  all  signals  the  average  number  of  measurements  per 
signal  decreases  as  a  function  of  J.  The  trend  suggests  that  for  large  J  close  to  I\  measurements  per 
signal  should  suffice.  On  the  contrary,  with  separate  CS  recovery,  for  perfect  recovery  of  all  signals 
the  number  of  measurements  per  sensor  increases  as  a  function  of  J.  This  occurs  because  each 
signal  experiences  an  independent  probability  p  <  1  of  successful  recovery;  therefore  the  overall 
probability  of  complete  success  is  pJ .  Consequently,  each  sensor  must  compensate  by  making 
additional  measurements.  This  phenomenon  further  motivates  joint  recovery  under  JSM-2. 

Finally,  we  note  that  we  can  use  algorithms  other  than  DCS-SOMP  to  recover  the  signals  under 
the  JSM-2  model.  Cotter  et  al.  [55]  have  proposed  additional  algorithms  (such  as  M-FOCUSS)  that 
iteratively  eliminate  basis  vectors  from  the  dictionary  and  converge  to  the  set  of  sparse  basis  vectors 
over  which  the  signals  are  supported.  We  hope  to  extend  such  algorithms  to  JSM-2  in  future  work. 

5.3  Recovery  strategies  for  nonsparse  common  component 
+  sparse  innovations  (JSM-3) 

The  JSM-3  signal  ensemble  model  from  Section  3.3.3  provides  a  particularly  compelling  motivation 
for  joint  recovery.  Under  this  model,  no  individual  signal  Xj  is  sparse,  and  so  recovery  of  each 
signal  separately  would  require  fully  N  measurements  per  signal.  As  in  the  other  JSMs,  however, 
the  commonality  among  the  signals  makes  it  possible  to  substantially  reduce  this  number.  Again, 
the  potential  for  this  savings  is  evidenced  by  specializing  Theorem  4  to  the  context  of  JSM-3. 
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Corollary  3  If  is  a  random  Gaussian  matrix  for  all  j  G  A,  V  is  defined  by  JSM-3,  and 

YM>  ^  £  Kj  +  KC(T,P)  +  |r|,  re  A, 
ie r  je r 

>  E  K,  +N  +  J-  Kr,  (16) 

j e A  je A 

then  the  signal  ensemble  X  can  be  uniquely  recovered  from  Y  with  probability  one. 

This  suggests  that  the  number  of  measurements  of  an  individual  signal  can  be  substantially 
decreased,  as  long  as  the  total  number  of  measurements  is  sufficiently  large  to  capture  enough 
information  about  the  nonsparse  common  component  zc-  The  term  Kr  denotes  the  number  of 
indices  where  the  common  and  all  innovation  components  overlap,  and  appears  due  to  the  sparsity 
reduction  that  can  be  performed  at  the  common  component  before  recovery.  We  also  note  that 
when  the  supports  of  the  innovations  are  independent,  as  J  — >  oo,  it  becomes  increasingly  unlikely 
that  a  given  index  will  be  included  in  all  innovations,  and  thus  the  terms  Kc(T,  P )  and  Kr  will  go 
to  zero.  On  the  other  hand,  when  the  supports  are  completely  matched  (implying  Kj  =  K ,  j  €  A), 
we  will  have  Kr  =  K ,  and  after  sparsity  reduction  has  been  addressed,  Kc(T ,  P)  =  0  for  all  T  C  A. 

5.3.1  Recovery  via  Transpose  Estimation  of  Common  Component  (TECC) 

Successful  recovery  of  the  signal  ensemble  {xj}  requires  recovery  of  both  the  nonsparse  common 
component  zc  and  the  sparse  innovations  {zj}.  To  help  build  intuition  about  how  we  might 
accomplish  signal  recovery  using  far  fewer  than  N  measurements  per  sensor,  consider  the  following 
thought  experiment. 

If  zc  were  known,  then  each  innovation  Zj  could  be  estimated  using  the  standard  single-signal 
CS  machinery  on  the  adjusted  measurements  y:j  —  Tj zc  =  $jZj.  While  zc  is  not  known  in  advance, 
it  can  be  estimated  from  the  measurements.  In  fact,  across  all  J  sensors,  a  total  of  Ylje A  random 
projections  of  zc  are  observed  (each  corrupted  by  a  contribution  from  one  of  the  zj).  Since  zc  is 
not  sparse,  it  cannot  be  recovered  via  CS  techniques,  but  when  the  number  of  measurements  is 
sufficiently  large  (^JgAMj  N),  zc  can  be  estimated  using  standard  tools  from  linear  algebra. 
A  key  requirement  for  such  a  method  to  succeed  in  recovering  zc  is  that  each  Tj  be  different,  so 
that  their  rows  combine  to  span  all  of  MN .  In  the  limit  (again,  assuming  the  sparse  innovation 
coefficients  are  well-behaved),  the  common  component  zc  can  be  recovered  while  still  allowing  each 
sensor  to  operate  at  the  minimum  measurement  rate  dictated  by  the  {zj}.  A  prototype  algorithm 
is  listed  below,  where  we  assume  that  each  measurement  matrix  has  i.i.d.  A/"(0,  <rj)  entries. 

TECC  Algorithm  for  JSM-3 

1.  Estimate  common  component:  Define  the  matrix  T  as  the  vertical  concatenation  of  the 

regularized  individual  measurement  matrices  &j  =  1  d>; ,  that  is,  T  =  [<f>^ ,  d>2  , . . . ,  &Tj]T ■ 

jaj 

Calculate  the  estimate  of  the  common  component  as  zc  = 

2.  Estimate  measurements  generated  by  innovations:  Using  the  previous  estimate,  sub¬ 
tract  the  contribution  of  the  common  part  from  the  measurements  and  generate  estimates  for 
the  measurements  caused  by  the  innovations  for  each  signal:  yj  =  yj  —  QjZc- 

3.  Recover  innovations:  Using  a  standard  single-signal  CS  recovery  algorithm,7  obtain  esti¬ 
mates  of  the  innovations  Zj  from  the  estimated  innovation  measurements  y.j. 

'  For  tractable  analysis  of  the  TECC  algorithm,  the  proof  of  Theorem  8  employs  a  least-squares  variant  of  f?o-norm 
minimization. 
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4.  Obtain  signal  estimates:  Estimate  each  signal  as  the  sum  of  the  common  and  innovations 
estimates;  that  is,  Xj  =  zc  +  Zj- 

The  following  theorem,  proved  in  Appendix  I,  shows  that  asymptotically,  by  using  the  TECC 
algorithm,  each  sensor  needs  to  only  measure  at  the  rate  dictated  by  the  sparsity  Kj. 

Theorem  8  Assume  that  the  nonzero  expansion  coefficients  of  the  sparse  innovations  Zj  are  i.i.d. 
Gaussian  random  variables  and  that  their  locations  are  uniformly  distributed  on  {1,2,...,  N}.  Let 
the  measurement  matrices  4>j  contain  i.i.d.  A/”(0,  <r|)  entries  with  Mj  >  Kj  +  1.  Then  each  signal 
Xj  can  be  recovered  using  the  TECC  algorithm  with  probability  approaching  one  as  J  — >  oo. 

For  large  J,  the  measurement  rates  permitted  by  Theorem  8  are  the  best  possible  for  any 
recovery  strategy  for  JSM-3  signals,  even  neglecting  the  presence  of  the  nonsparse  component. 
These  rates  meet  the  minimum  bounds  suggested  by  Corollary  3,  although  again  Theorem  8  is  of 
a  slightly  different  flavor,  as  it  does  not  provide  a  uniform  guarantee  for  all  sparse  signal  ensembles 
under  any  particular  measurement  matrix  collection.  The  CS  technique  employed  in  Theorem  8 
involves  combinatorial  searches  that  estimate  the  innovation  components;  we  have  provided  the 
theorem  simply  as  support  for  our  intuitive  development  of  the  TECC  algorithm.  More  efficient 
techniques  could  also  be  employed  (including  several  proposed  for  CS  in  the  presence  of  noise  [25,  63, 
64]).  It  is  reasonable  to  expect  similar  behavior;  as  the  error  in  estimating  the  common  component 
diminishes,  these  techniques  should  perform  similarly  to  their  noiseless  analogues. 

5.3.2  Recovery  via  Alternating  Common  and  Innovation  Estimation  (ACIE) 

The  preceding  analysis  demonstrates  that  the  number  of  required  measurements  in  JSM-3  can  be 
substantially  reduced  through  joint  recovery.  While  Theorem  8  shows  theoretical  gains  as  J  — ►  oo, 
practical  gains  can  also  be  realized  with  a  moderate  number  of  sensors.  In  particular,  suppose 
in  the  TECC  algorithm  that  the  initial  estimate  zc  is  not  accurate  enough  to  enable  correct 
identification  of  the  sparse  innovation  supports  { ST y } .  In  such  a  case,  it  may  still  be  possible  for 
a  rough  approximation  of  the  innovations  {zj}  to  help  refine  the  estimate  zc-  This  in  turn  could 
help  to  refine  the  estimates  of  the  innovations. 

The  Alternating  Common  and  Innovation  Estimation  (ACIE)  algorithm  exploits  the  observa¬ 
tion  that  once  the  basis  vectors  comprising  the  innovation  Zj  have  been  identified  in  the  index  set 
Qj ,  their  effect  on  the  measurements  yj  can  be  removed  to  aid  in  estimating  zc-  Suppose  that  we 
have  an  estimate  for  these  innovation  basis  vectors  in  f 1j.  We  can  then  partition  the  measurements 
into  two  parts:  the  projection  into  span({^jjn}ng^  )  and  the  component  orthogonal  to  that  span. 

We  build  a  basis  for  the  where  yj  lives: 

Qj]> 

where  <3T  ^  is  the  mutilated  matrix  &j  corresponding  to  the  indices  in  Llj,  and  the  Mj  x  (Mj  —  |fl/|) 
matrix  Qj  has  orthonormal  columns  that  span  the  orthogonal  complement  of  4E  ^  • 

This  construction  allows  us  to  remove  the  projection  of  the  measurements  into  the  aforemen¬ 
tioned  span  to  obtain  measurements  caused  exclusively  by  vectors  not  in  f 

V]  =  QjlJj  and  T,  =  Qj$j.  (17) 

These  modifications  enable  the  sparse  decomposition  of  the  measurement,  which  now  lives  in 
to  remain  unchanged: 


26 
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Vj  =  ^2  aiCt>i,n- 

72—1 


Thus,  the  modified  measurements  Y  =  \Tf[  . . .  r/J]  and  modified  measurement  matrix  <F  = 

~  ~  -iT 

<&[  $1  ...  4>  Tj  can  be  used  to  refine  the  estimate  of  the  common  component  of  the  signal, 

S5  =  &Y,  (18) 


where  A ^  =  (A1-  A)  1Ar  denotes  the  pseudoinverse  of  matrix  A. 

When  the  innovation  support  estimate  is  correct  (fij  =  fij),  the  measurements  yj  will  describe 
only  the  common  component  zc-  If  this  is  true  for  every  signal  j  and  the  number  of  remaining 
measurements  ~~  Kj)  —  N,  then  zc  can  be  perfectly  recovered  via  (18).  However,  it  may 

be  difficult  to  obtain  correct  estimates  for  all  signal  supports  in  the  first  iteration  of  the  algorithm, 
and  so  we  find  it  preferable  to  refine  the  estimate  of  the  support  by  executing  several  iterations. 


ACIE  Algorithm  for  JSM-3 

1.  Initialize:  Set  Qj  =  0  for  each  j .  Set  the  iteration  counter  t  =  1. 

2.  Estimate  common  component:  Update  estimate  zc  according  to  (17)-(18). 

3.  Estimate  innovation  supports:  For  each  sensor  j,  after  subtracting  the  contribution  zc 
from  the  measurements,  yj  =  yj  —  &jZc,  estimate  the  support  of  each  signal  innovation  Qj. 

4.  Iterate:  If  £  <  L,  a  preset  number  of  iterations,  then  increment  t  and  return  to  Step  2. 
Otherwise  proceed  to  Step  5. 

5.  Estimate  innovation  coefficients:  For  each  j .  estimate  the  coefficients  for  the  indices  in 


where  z.  ^  is  a  mutilated  version  of  the  innovation’s  sparse  coefficient  vector  estimate  'zj. 
6.  Recover  signals:  Compute  the  estimate  of  each  signal  as  Xj  =  'zc  +  %• 


Estimation  of  the  supports  in  Step  3  can  be  accomplished  using  a  variety  of  techniques.  We 
propose  to  run  a  fixed  number  of  iterations  of  OMP;  if  the  supports  of  the  innovations  are  known 
to  match  across  signals  —  as  in  JSM-2  —  then  more  powerful  algorithms  like  SOMP  can  be  used. 
The  ACIE  algorithm  is  similar  in  spirit  to  other  iterative  estimation  algorithms,  such  as  turbo 
decoding  [65]. 

5.3.3  Simulations  for  JSM-3 

We  now  present  simulations  of  JSM-3  recovery  for  the  following  scenario.  Consider  J  signals  of 
length  N  =  50  containing  a  common  white  noise  component  zc{n)  ~  A/"(0, 1)  for  n  G  {1, . . .  ,  N}. 
Each  innovations  component  Zj  has  sparsity  K  =  5  (once  again  in  the  time  domain),  resulting  in 
Xj  =  zc  +  Zj.  The  signals  are  generated  according  to  the  model  used  in  Section  5.1.6. 

We  study  two  different  cases.  The  first  is  an  extension  of  JSM-1:  we  select  the  supports  for  the 
various  innovations  separately  and  then  apply  OMP  to  each  signal  in  Step  3  of  the  ACIE  algorithm 
in  order  to  estimate  its  innovations  component.  The  second  case  is  an  extension  of  JSM-2:  we  select 
one  common  support  for  all  of  the  innovations  across  the  signals  and  then  apply  the  DCS-SOMP 
algorithm  (Section  5.2.2)  to  estimate  the  innovations  in  Step  3.  In  both  cases  we  use  L  =  10 
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(a)  (b) 

Figure  6:  Recovering  a  signal  ensemble  with  nonsparse  common  component  and  sparse  innovations  (JSM-3) 
using  ACIE.  (a)  recovery  using  OMP  separately  on  each  signal  in  Step  3  of  the  ACIE  algorithm  (innovations 
have  arbitrary  supports),  (b)  recovery  using  DCS-SOMP  jointly  on  all  signals  in  Step  3  of  the  ACIE 
algorithm  (innovations  have  identical  supports).  Signal  length  N  =  50,  sparsity  K  =  5.  The  common 
structure  exploited  by  DCS-SOMP  enables  dramatic  savings  in  the  number  of  measurements.  We  average 
over  1000  simulation  runs. 


iterations  of  ACIE.  We  test  the  algorithms  for  different  numbers  of  signals  J  and  calculate  the 
probability  of  correct  recovery  as  a  function  of  the  (same)  number  of  measurements  per  signal  M. 

Figure  6(a)  shows  that,  for  sufficiently  large  J,  we  can  recover  all  of  the  signals  with  significantly 
fewer  than  N  measurements  per  signal.  As  J  grows,  it  becomes  more  difficult  to  perfectly  recover 
all  J  signals.  We  believe  this  is  inevitable,  because  even  if  zc  were  known  without  error,  then 
perfect  ensemble  recovery  would  require  the  successful  execution  of  J  independent  runs  of  OMP. 
Second,  for  small  J,  the  probability  of  success  can  decrease  at  high  values  of  M.  We  believe  this 
behavior  is  due  to  the  fact  that  initial  errors  in  estimating  zc  may  tend  to  be  somewhat  sparse 
(since  zc  roughly  becomes  an  average  of  the  signals  {xj}),  and  these  sparse  errors  can  mislead 
the  subsequent  OMP  processes.  For  more  moderate  M,  it  seems  that  the  errors  in  estimating 
zc  (though  greater)  tend  to  be  less  sparse.  We  expect  that  a  more  sophisticated  algorithm  could 
alleviate  such  a  problem;  the  problem  is  also  mitigated  at  higher  J. 

Figure  6(b)  shows  that  when  the  sparse  innovations  share  common  supports  we  see  an  even 
greater  savings.  As  a  point  of  reference,  a  traditional  approach  to  signal  acquisition  would  require 
1600  total  measurements  to  recover  these  J  =  32  nonsparse  signals  of  length  N  =  50.  Our  approach 
requires  only  approximately  10  random  measurements  per  sensor  —  a  total  of  320  measurements 
—  for  high  probability  of  recovery. 

6  Discussion  and  Conclusions 

In  this  paper  we  have  extended  the  theory  and  practice  of  compressive  sensing  to  multi-signal, 
distributed  settings.  The  number  of  noiseless  measurements  required  for  ensemble  recovery  is  de¬ 
termined  by  the  dimensionality  of  the  subspace  in  the  relevant  signal  model,  because  dimensionality 
and  sparsity  play  a  volumetric  role  akin  to  the  entropy  used  to  characterize  rates  in  source  cod¬ 
ing.  Our  three  example  joint  sparsity  models  (JSMs)  for  signal  ensembles  with  both  intra-  and 
inter-signal  correlations  capture  the  essence  of  real  physical  scenarios,  illustrate  the  basic  analysis 
and  algorithmic  techniques,  and  indicate  the  significant  gains  to  be  realized  from  joint  recovery.  In 
some  sense,  distributed  compressive  sensing  (DCS)  is  a  framework  for  distributed  compression  of 
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sources  with  memory,  which  has  remained  a  challenging  problem  for  some  time. 

In  addition  to  offering  substantially  reduced  measurement  rates,  the  DCS-based  distributed 
source  coding  schemes  we  develop  here  share  the  properties  of  CS  mentioned  in  Section  2.  Two 
additional  properties  of  DCS  make  it  well-matched  to  distributed  applications  such  as  sensor  net¬ 
works  and  arrays  [51,  52].  First,  each  sensor  encodes  its  measurements  separately,  which  eliminates 
the  need  for  inter-sensor  communication.  Second,  DCS  distributes  its  computational  complexity 
asymmetrically,  placing  most  of  it  in  the  joint  decoder,  which  will  often  have  more  computational 
resources  than  any  individual  sensor  node.  The  encoders  are  very  simple;  they  merely  compute 
incoherent  projections  with  their  signals  and  make  no  decisions. 

There  are  many  opportunities  for  applications  and  extensions  of  these  ideas.  First,  natural 
signals  are  not  exactly  sparse  but  rather  can  be  better  modeled  as  ^-compressible  with  0  <  p  < 
1.  Roughly  speaking,  a  signal  in  a  weak-lp  ball  has  coefficients  that  decay  as  n~1/,p  once  sorted 
according  to  magnitude  [24],  The  key  concept  is  that  the  ordering  of  these  coefficients  is  important. 
For  JSM-2,  we  can  extend  the  notion  of  simultaneous  sparsity  for  £p-sparse  signals  whose  sorted 
coefficients  obey  roughly  the  same  ordering  [66].  This  condition  could  perhaps  be  enforced  as  an 
l p  constraint  on  the  composite  signal 

J  J  J 

XM1)!*  XM2)I>  ■■■>  XMAH 

3= 1  J=1  i=l 

Second,  (random)  measurements  are  real  numbers;  quantization  gradually  degrades  the  recov¬ 
ery  quality  as  the  quantization  becomes  coarser  [34,  67,  68].  Moreover,  in  many  practical  situations 
some  amount  of  measurement  noise  will  corrupt  the  {xj},  making  them  not  exactly  sparse  in  any 
basis.  While  characterizing  these  effects  and  the  resulting  rate-distortion  consequences  in  the  DCS 
setting  are  topics  for  future  work,  there  has  been  work  in  the  single-signal  CS  literature  that  we 
should  be  able  to  leverage,  including  variants  of  Basis  Pursuit  with  Denoising  [63,  69],  robust  iter¬ 
ative  recovery  algorithms  [64],  CS  noise  sensitivity  analysis  [25,34],  the  Dantzig  Selector  [33],  and 
one-bit  CS  [70]. 

Third,  in  some  applications,  the  linear  program  associated  with  some  DCS  decoders  (in  JSM-1 
and  JSM-3)  could  prove  too  computationally  intense.  As  we  saw  in  JSM-2,  efficient  iterative  and 
greedy  algorithms  could  come  to  the  rescue,  but  these  need  to  be  extended  to  the  multi-signal 
case.  Recent  results  on  recovery  from  a  union  of  subspaces  give  promise  for  efficient,  model-based 
algorithms  [66]. 

Finally,  we  focused  our  theory  on  models  that  assign  common  and  innovation  components 
to  the  signals  in  the  ensemble.  Other  models  tailored  to  specific  applications  can  be  posed;  for 
example,  in  hyperspectral  imaging  applications,  it  is  common  to  obtain  strong  correlations  only 
across  spectral  slices  within  a  certain  neighborhood.  It  would  be  then  appropriate  to  pose  a 
common/innovation  model  with  separate  common  components  that  are  localized  to  a  subset  of  the 
spectral  slices  obtained.  Results  similar  to  those  obtained  in  Section  4  are  simple  to  derive  for 
models  with  full-rank  location  matrices. 


A  Proof  of  Theorem  1 

Statement  2  is  an  application  of  the  achievable  bound  of  Theorem  4  to  the  case  of  J  =  1  signal.  It 
remains  then  to  prove  Statements  1  and  3. 


29 


Statement  1  (Achievable,  M  >  2 K):  We  first  note  that,  if  K  >  N/ 2,  then  with  probability 
one,  the  matrix  $  has  rank  N,  and  there  is  a  unique  (correct)  recovery.  Thus  we  assume  that 
K  <  N/2.  With  probability  one,  all  subsets  of  up  to  2 K  columns  drawn  from  T  are  linearly 
independent.  Assuming  this  holds,  then  for  two  index  sets  12  12  such  that  1 12 1  =  1 12 1  =  K , 

colspan)*]?^)  ncolspan(T^)  has  dimension  equal  to  the  number  of  indices  common  to  both  12  and  12. 
A  signal  projects  to  this  common  space  only  if  its  coefficients  are  nonzero  on  exactly  these  (fewer 
than  K )  common  indices;  since  ||0||o  =  K,  this  does  not  occur.  Thus  every  K- sparse  signal  projects 
to  a  unique  point  in  RM. 

Statement  3  (Converse,  M  <  K):  If  M  <  K,  there  is  insufficient  information  in  the  vector 
y  to  recover  the  K  nonzero  coefficients  of  8;  thus  we  assume  M  =  K.  In  this  case,  there  is  a 
single  explanation  for  the  measurements  only  if  there  is  a  single  set  12  of  K  linearly  independent 
columns  and  the  nonzero  indices  of  9  are  the  elements  of  12.  Aside  from  this  pathological  case,  the 
rank  of  subsets  will  generally  be  less  than  I\  —  which  would  prevent  robust  recovery  of  signals 
supported  on  12,  or  will  be  equal  to  K  —  which  would  give  ambiguous  solutions  among  all  such 
sets  12.  □ 


B  Proof  of  Theorem  3 


We  let  _ 

D:=KC  +  J2KJ 

je  a 


(19) 


denote  the  number  of  columns  in  P.  Because  P  €  Vf(X),  there  exists  0  €  such  that  X  =  P0. 
Because  Y  =  TX,  0  is  a  solution  to  Y  =  TP0.  We  will  argue  that,  with  probability  one  over  T, 


T  :=  TP 


has  rank  D,  and  thus  0  is  the  unique  solution  to  the  equation  Y  =  TP0  =  T0. 

We  recall  that,  under  our  connnon/innovation  model,  P  has  the  form  (5),  where  Pc  is  an 
N  x  Kq  submatrix  of  the  N  x  N  identity,  and  each  Pj,  j  S  A,  is  an  Ax  Kj  submatrix  of  the 
N  x  N  identity.  To  prove  that  T  has  rank  D,  we  will  require  the  following  lemma,  which  we  prove 
in  Appendix  C. 

Lemma  2  If  (7)  holds,  then  there  exists  a  mapping  C  :  { 1,2,...,  Kq}  — ►  A,  assigning  each  element 
of  the  common  component  to  one  of  the  sensors,  such  that  for  each  T  C  A, 

KC 

+  (20) 

jer  je  r  k= l 

and  such  that  for  each  k  E  { 1,2, ,  Kc },  the  kth  column  of  Pc  is  not  a  column  of  Pc(k)- 

Intuitively,  the  existence  of  such  a  mapping  suggests  that  (i)  each  sensor  has  taken  enough 
measurements  to  cover  its  own  innovation  (requiring  Kj  measurements)  and  perhaps  some  of  the 
common  component,  ( ii )  for  any  T  C  A,  the  sensors  in  T  have  collectively  taken  enough  extra 
measurements  to  cover  the  requisite  Kc(P,P )  elements  of  the  common  component,  and  (in)  the 
extra  measurements  are  taken  at  sensors  where  the  common  and  innovation  components  do  not 
overlap.  Formally,  we  will  use  the  existence  of  such  a  mapping  to  prove  that  T  has  rank  D. 
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We  proceed  by  noting  that  T  has  the  form 

0 
0 

.  5 

QjPj  . 

where  each  &jPc  (respectively,  &jPj)  is  an  Mj  x  Kq  (respectively,  Mj  x  Kj)  submatrix  of  $>j 
obtained  by  selecting  columns  from  according  to  the  nonzero  entries  of  Pc  (respectively,  Pj). 
In  total,  T  has  D  columns  (19).  To  argue  that  T  has  rank  D,  we  will  consider  a  sequence  of  three 
matrices  To,  Ti,  and  T2  constructed  from  small  modifications  to  T. 

We  begin  by  letting  To  denote  the  “partially  zeroed”  matrix  obtained  from  T  using  the  following 
construction.  We  first  let  To  =  T  and  then  make  the  following  adjustments: 

1.  Let  k  =  1. 

2.  For  each  j  such  that  Pj  has  a  column  that  matches  column  k  of  Pc  (note  that  by  Lemma  2 
this  cannot  happen  if  C(k)  =  j),  let  k'  represent  the  column  index  of  the  full  matrix  P  where 
this  column  of  Pj  occurs.  Subtract  column  k'  of  To  from  column  k  of  To-  This  forces  to  zero 
all  entries  of  To  formerly  corresponding  to  column  k  of  the  block  &jPc- 

3.  If  k  <  Kc,  add  one  to  k  and  go  to  step  2. 

The  matrix  To  is  identical  to  T  everywhere  except  on  the  first  Kc  columns,  where  any  portion  of  a 
column  overlapping  with  a  column  of  &jPj  to  its  right  has  been  set  to  zero.  Thus,  To  satisfies  the 
next  two  properties,  which  will  be  inherited  by  matrices  Ti  and  T2  that  we  subsequently  define: 

PI.  Each  entry  of  To  is  either  zero  or  a  Gaussian  random  variable. 

P2.  All  Gaussian  random  variables  in  To  are  i.i.d. 

Finally,  because  To  was  constructed  only  by  subtracting  columns  of  T  from  one  another, 

rank(To)  =  rank(T).  (21) 

We  now  let  Ti  be  the  matrix  obtained  from  To  using  the  following  construction.  For  each 

j  €  A,  we  select  Kj  +  J2/f=i  1  c(k)=j  arbitrary  rows  from  the  portion  of  To  corresponding  to  sensor 

j.  Using  (19),  the  resulting  matrix  Ti  has 

E  (Ki + E  =  E  K=  +  K°  =  D 

je a  \  k= 1  /  je a 

rows.  Also,  because  Ti  was  obtained  by  selecting  a  subset  of  rows  from  To,  it  has  D  columns  and 
satisfies 

rank(Ti)  <  rank(To).  (22) 

We  now  let  T2  be  the  D  x  D  matrix  obtained  by  permuting  columns  of  Ti  using  the  following 
construction: 

1.  Let  T2  =  [  ],  and  let  j  =  1. 


<h  1  Pc  $\P\  0 

<f2Pc  0  <1>2-P2 

T  = 

(l\jPc  0  0 
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2.  For  each  k  such  that  C(k)  =  j,  let  T\(k)  denote  the  kth  column  of  Yi,  and  concatenate  Yi (k) 
to  Y2,  i.e.,  let  Y2  <—  [Y2  Yi  (/:)].  There  are  Ylk=i  1  c{k)=j  such  columns. 

3.  Let  Y',  denote  the  columns  of  Yi  corresponding  to  the  entries  of  &jPj  (the  innovation  com¬ 
ponents  of  sensor  j),  and  concatenate  Y)  to  Y2,  i.e.,  let  Y2  [Y2  Y^].  There  are  Kj  such 
columns. 

4.  If  j  <  J,  let  j  <—  j  +  1  and  go  to  Step  2. 

Because  Yi  and  Y2  share  the  same  columns  up  to  reordering,  it  follows  that 

rank(Y2)  =  rank(Yi).  (23) 

Based  on  its  dependency  on  Yo,  and  following  from  Lemma  2,  the  square  matrix  Y2  meets  properties 
PI  and  P2  defined  above  in  addition  to  a  third  property: 

P3.  All  diagonal  entries  of  Y2  are  Gaussian  random  variables. 

This  follows  because  for  each  j.  Kj  +  1 c(k)=j  rows  of  Yi  are  assigned  in  its  construction,  while 

Kj  +  Ylk=i  1  C(k)=j  columns  of  Y2  are  assigned  in  its  construction.  Thus,  each  diagonal  element  of 
Y2  will  either  be  an  entry  of  some  'jPj,  which  remains  Gaussian  throughout  our  constructions,  or 
it  will  be  an  entry  of  some  kth  column  of  some  Tj Pc  for  which  C(k )  =  j.  In  the  latter  case,  we 
know  by  Lemma  2  and  the  construction  of  Yo  that  this  entry  remains  Gaussian  throughout  our 
constructions. 

Having  identified  these  three  properties  satisfied  by  Y2,  we  will  prove  by  induction  that,  with 
probability  one  over  $,  such  a  matrix  has  full  rank. 


Lemma  3  Let  T^d  ^  be  a  (d—  1)  x  (d—  1)  matrix  having  full  rank.  Construct 
as  follows: 


jj(d)  := 


yg-l  Vl  ' 

v\  oj  _ 


a  dx  d  matrix 


where  V\,V2  €  Md_1  are  vectors  with  each  entry  being  either  zero  or  a  Gaussian  random  variable, 
to  is  a  Gaussian  random  variable,  and  all  random  variables  are  i.i.d.  and  independent  of  Y^-1). 
Then  with  probability  one,  Y^l  has  full  rank. 


Applying  Lemma  3  inductively  D  times,  the  success  probability  remains  one.  It  follows  that 
with  probability  one  over  <L,  rank(Y2)  =  D.  Combining  this  last  result  with  (21-23),  we  obtain 
rank(Y)  =  D  with  probability  one  over  $.  It  remains  to  prove  Lemma  3. 

Proof  of  Lemma  3:  When  d  =  1,  Y ^  =  [o’],  which  has  full  rank  if  and  only  if  w  0,  which 
occurs  with  probability  one. 

When  d  >  1,  using  expansion  by  minors,  the  determinant  of  Y^  satisfies 

det(Y^)  =  u  ■  det(Y^“1^)  +  C, 

where  C  =  C{ Y^"1)  ,  v\ ,  V2)  is  independent  of  t o.  The  matrix  has  full  rank  if  and  only  if 
det(Y^)  7^  0,  which  is  satisfied  if  and  only  if 

/  -c* 

det(Y(«*-i))- 

By  assumption,  det(Y(d_1))  0  and  a;  is  a  Gaussian  random  variable  that  is  independent  of  C  and 

det(Y(d-1^).  Thus,  uj  /  with  probability  one.  □ 
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C  Proof  of  Lemma  2 


To  prove  this  lemma,  we  apply  tools  from  graph  theory. 

We  seek  a  matching  within  the  graph  G  =  (Vv,Vm,E)  from  Figure  1,  i.e. ,  a  subgraph 
(Vy,  Vm,  E )  with  E  C  E  that  pairs  each  element  of  Vy  with  a  unique  element  of  Vm-  Such  a  match¬ 
ing  will  immediately  give  us  the  desired  mapping  C  as  follows:  for  each  k  €  {1,2,...,  Kc}  C  Vy, 
we  let  (j,  m )  €  Vm  denote  the  single  node  matched  to  k  by  an  edge  in  E,  and  we  set  C(k)  =  j. 

To  prove  the  existence  of  such  a  matching  within  the  graph,  we  invoke  a  version  of  Hall’s 
marriage  theorem  for  bipartite  graphs  [71].  Hall’s  theorem  states  that  within  a  bipartite  graph 
(Hi ,  V2,  E),  there  exists  a  matching  that  assigns  each  element  of  V\  to  a  unique  element  of  V2  if  for 
any  collection  of  elements  n  C  V\ ,  the  set  i?(n)  of  neighbors  of  n  in  V2  has  cardinality  |_E7(H)  |  >  |H| . 

In  the  context  of  our  lemma,  Hall’s  condition  requires  that  for  any  set  of  entries  in  the  value 
vector,  n  C  Vy,  the  set  -F(n)  of  neighbors  of  n  in  Vm  has  size  |i?(n)|  >  |H| .  We  will  prove  that  if 
(7)  is  satisfied,  then  Hall’s  condition  is  satisfied,  and  thus  a  matching  must  exist. 

Let  us  consider  an  arbitrary  set  n  C  Vy.  We  let  E(H)  denote  the  set  of  neighbors  of  n  in  Vyj 
joined  by  edges  in  E,  and  we  let  S'n  =  {j  6  A  :  ( j,m )  G  TF(H)  for  some  m}.  Thus,  Sn  C  A  denotes 
the  set  of  signal  indices  whose  measurement  nodes  have  edges  that  connect  to  n.  It  follows  that 
|£(n)|  =  YljeSn  Mj.  Thus,  in  order  to  satisfy  Hall’s  condition  for  n,  we  require 

E  Mi  ^  lnl-  (24) 

j£Sn 

We  would  now  like  to  show  that  YljeSu  +  Ec{Sn,P)  >  |H| ,  and  thus  if  (7)  is  satisfied  for  all 
T  C  A,  then  (24)  is  satisfied  in  particular  for  S'n  C  A. 

In  general,  the  set  n  may  contain  vertices  for  both  common  components  and  innovation  com¬ 
ponents.  We  write  n  =  n/  U  He  to  denote  the  disjoint  union  of  these  two  sets. 

By  construction,  |Hj |  =  ^  -eSn  Kj  because  we  count  all  innovations  with  neighbors  in  Sn,  and 
because  Sn  contains  all  neighbors  for  nodes  in  n/.  We  will  also  argue  that  Kc(Sy,  P)  >  |nd  as 
follows.  By  definition,  for  a  set  T  C  A,  Kc(T,P)  counts  the  number  of  columns  in  Pc  that  also 
appear  in  Pj  for  all  j  ^  T.  By  construction,  for  each  k  G  He,  node  k  has  no  connection  to  nodes 
(j,  m)  for  j  ^  Sn;  thus  it  must  follow  that  the  kth  column  of  Pc  is  present  in  Pj  for  all  j  ^  Sn,  due 
to  the  construction  of  the  graph  G.  Consequently,  Kc(Sn,P )  >  |nd- 

Thus,  XljeSn  +  Kc(Sn,P )  >  |Hj |  +  |nd  =  |H| ,  and  so  (7)  implies  (24)  for  any  n,  and  so 
Hall’s  condition  is  satisfied,  and  a  matching  exists.  Because  in  such  a  matching  a  set  of  vertices  in 
Vm  matches  to  a  set  in  Vy  of  lower  or  equal  cardinality,  we  have  in  particular  that  (20)  holds  for 
each  T  C  A.  □ 


D  Proof  of  Theorem  4 

Given  the  measurements  Y  and  measurement  matrix  <f>,  we  will  show  that  it  is  possible  to  recover 
some  P  €  Vf{X)  and  a  corresponding  vector  0  such  that  X  =  PQ  using  the  following  algorithm: 

•  Take  the  last  measurement  of  each  sensor  for  verification,  and  sum  these  J  measurements  to 
obtain  a  single  global  test  measurement  y.  Similarly,  add  the  corresponding  rows  of  into  a 
single  row  cj). 

•  Group  all  the  remaining  Mj  —  J  measurements  into  a  vector  Y  and  a  matrix  4>. 
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•  For  each  matrix  P  £  V: 

—  choose  a  single  solution  0p  to  F  =  <f>P0p  independently  of  (j)  —  if  no  solution  exists, 
skip  the  next  two  steps; 

—  define  Xp  =  PQp; 

—  cross-validate:  check  if  y  =  4>Xp ;  if  so,  return  the  estimate  ( P ,  0p);  if  not,  continue  with 
the  next  matrix. 

We  begin  by  showing  that,  with  probability  one  over  <h,  the  algorithm  only  terminates  when  it 
gets  a  correct  solution  —  in  other  words,  that  for  each  P  G  V  the  cross-validation  measurement  y 
can  determine  whether  Xp  =  X.  We  note  that  all  entries  of  the  vector  <f>  are  i.i.d.  Gaussian,  and 
independent  from  $.  Assume  for  the  sake  of  contradiction  that  there  exists  a  matrix  P  €  V  such 
that  y  =  i if>Xp,  but  Xp  =  PQp  ^  X;  this  implies  f{X  —  Xp)  =  0,  which  occurs  with  probability 
zero  over  <f>.  Thus,  if  Xp  ^  X,  then  <f>Xp  ^  y  with  probability  one  over  <F  Since  we  only  need  to 
search  over  a  finite  number  of  matrices  PsP,  cross  validation  will  determine  whether  each  matrix 
P  E  V  gives  the  correct  solution  with  probability  one. 

We  now  show  that  there  is  a  matrix  in  V  for  which  the  algorithm  will  terminate  with  the  correct 
solution.  We  know  that  the  matrix  P*  €  Vf(X)  C  V  will  be  part  of  our  search,  and  that  the  unique 
solution  0p»  to  Y  =  <FP*0p*  yields  X  =  P*Qp*  when  (8)  holds  for  P*,  as  shown  in  Theorem  3. 
Thus,  the  algorithm  will  find  at  least  one  matrix  P  and  vector  0p  such  that  X  =  PQp ;  when  such 
matrix  is  found  the  cross-validation  step  will  return  this  solution  and  end  the  algorithm.  □ 

Remark  2  Consider  the  algorithm  used  in  the  proof:  if  the  matrices  in  V  are  sorted  by  number 
of  columns,  then  the  algorithm  is  akin  to  £q- norm  minimization  on  0  with  an  additional  cross- 
validation  step.  The  i^-norm  minimization  algorithm  is  known  to  be  optimal  for  recovery  of  strictly 
sparse  signals  from  noiseless  measurements. 

E  Proof  of  Theorem  5 

We  let  D  denote  the  number  of  columns  in  P.  Because  P  €  Vp(X),  there  exists  0  €  such  that 
X  =  PQ.  Because  Y  =  <FA,  then  0  is  a  solution  to  Y  =  <I> PQ.  We  will  argue  for  T  :=  <I> P  that 
rank(T)  <  D,  and  thus  there  exists  0^0  such  that  Y  =  T0  =  T0.  Moreover,  since  P  has  full 
rank,  it  follows  that  X  :=  PQ  ^  PQ  =  X. 

We  let  To  be  the  “partially  zeroed”  matrix  obtained  from  T  using  the  identical  procedure 
detailed  in  Appendix  B.  Again,  because  To  was  constructed  only  by  subtracting  columns  of  T 
from  one  another,  it  follows  that  rank(To)  =  rank(T). 

Suppose  T  C  A  is  a  set  for  which  (9)  holds.  We  let  Ti  be  the  submatrix  of  To  obtained  by 
selecting  the  following  columns: 

•  For  any  k  €  {1,2,...,  Kc}  such  that  column  k  of  Pc  also  appears  as  a  column  in  all  Pj  for 
j  £  r,  we  include  column  k  of  To  as  a  column  in  Tp  There  are  Kc(T,P)  such  columns  k. 

•  For  any  k  €  {Kc  +  T  Kc  +  2, . . . ,  D}  such  that  column  k  of  P  corresponds  to  an  innovation 
for  some  sensor  j  €  T,  we  include  column  k  of  To  as  a  column  in  Tp  There  are 

such  columns  k. 

This  submatrix  has  fZjer  Kj  +  Kc(T,P)  columns.  Because  To  has  the  same  size  as  T,  and  in 
particular  has  only  D  columns,  then  in  order  to  have  that  rank(To)  =  D,  it  is  necessary  that  all 
'YfjCpKj  +  Kc(T,P)  columns  of  Ti  be  linearly  independent. 
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Based  on  the  method  described  for  constructing  To,  it  follows  that  Ti  is  zero  for  all  measure¬ 
ment  rows  not  corresponding  to  the  set  T.  Therefore,  consider  the  submatrix  Y2  of  Ti  obtained 
by  selecting  only  the  measurement  rows  corresponding  to  the  set  T.  Because  of  the  zeros  in  Ti,  it 
follows  that  rank(Yi)  =  rank(T2).  However,  since  Y2  has  only  Xjgr  rows;  we  invoke  (9)  and 
have  that  rank(Ti)  =  rank(T2)  <  Xjer  Mj  <  Xje r  Kj  +  Kc{ T,  P).  Thus,  all  Xjer  Kj  +  KcY,  P) 
columns  of  Ti  cannot  be  linearly  independent,  and  so  T  does  not  have  full  rank.  □ 

F  Proof  of  Lemma  1 

Necessary  conditions  on  innovation  components:  We  begin  by  proving  that  in  order  to  recover 
zc,  zi,  and  Z2  via  the  7- weighted  Id -norm  formulation  it  is  necessary  that  z\  can  be  recovered  via 
single-signal  Id-norm  minimization  using  dq  and  measurements  y\  =  Q\Z\. 

Consider  the  single-signal  £i-norm  minimization  problem 

z\  =  argmin  ||zi||i  s.t.  yi  =  <&iz\. 

Suppose  that  this  £ 1  -norm  minimization  for  z\  fails;  that  is,  there  exists  z\  f-  z\  such  that  y\  =  &\Zi 
and  ||Ti||i  <  ||^i||i.  Therefore,  substituting  z\  instead  of  z\  in  the  7-weighted  £i-norm  formulation 
(12)  provides  an  alternate  explanation  for  the  measurements  with  a  smaller  or  equal  modified  £\- 
norm  penalty.  Consequently,  recovery  of  z\  using  (12)  will  fail  and  we  will  recover  x\  incorrectly. 
We  conclude  that  the  single-signal  Id-norm  minimization  of  z\  using  Ti  is  necessary  for  successful 
recovery  using  the  7-weighted  Id-norm  formulation.  A  similar  condition  for  -norm  minimization 
of  Z2  using  $2  and  measurements  $2^2  can  be  proved  in  an  analogous  manner. 

Necessary  condition  on  common  component:  We  now  prove  that  in  order  to  recover  zc, 
z\,  and  Z2  via  the  7-weighted  fd-nornr  formulation  it  is  necessary  that  zc  can  be  recovered  via  single¬ 
signal  Id-norm  minimization  using  the  joint  matrix  [<f>^  ^YY  and  measurements  &JY  ZC- 

The  proof  is  very  similar  to  the  previous  proof  for  the  innovation  component  z\ .  Consider  the 
single-signal  Id-norm  minimization 

zc  =  argmin  ||zc||i  s.t.  yc  =  [<hf  $l]Tzc- 

Suppose  that  this  id-norm  minimization  for  zc  fails;  that  is,  there  exists  zc  f  ZC  such  that  yc  = 
}T zc  and  ll^clli  <  ll^clli.  Therefore,  substituting  zc  instead  of  zc  in  the  7-weighted  Id-norm 
formulation  (12)  provides  an  alternate  explanation  for  the  measurements  with  a  smaller  modified 
id -norm  penalty.  Consequently,  the  recovery  of  zc  using  the  7-weighted  £  1  -norm  formulation  (12) 
will  fail,  and  thus  we  will  recover  x\  and  X2  incorrectly.  We  conclude  that  the  single-signal  £ \  -norm 
minimization  of  zc  using  &2Y  is  necessary  for  successful  recovery  using  the  7-weighted  £\- 
norrn  formulation.  □ 


G  Proof  of  Theorem  6 


We  construct  measurement  matrices  <hi  and  dq  that  consist  of  two  sets  of  rows.  The  first  set  of 
rows  is  identical  in  both  and  recovers  the  signal  difference  x\  —  X2-  The  second  set  is  different  and 
recovers  the  signal  average  ^ x\  +  \x2 •  Let  the  submatrix  formed  by  the  identical  rows  for  the  signal 
difference  be  $£>,  and  let  the  submatrices  formed  by  unique  rows  for  the  signal  average  be 
and  $a,2-  Thus  the  measurement  matrices  <I?i  and  $2  are  of  the  following  form: 


Ti  = 


&A,  1 


and  d>2 


®A,2 
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The  submatrices  <b/},  $,4,1,  and  <&a,2  contain  i.i.d.  Gaussian  entries.  Once  the  difference  x\  —  X2 
and  average  \x\  +  \ X2  have  been  recovered  using  the  above  technique,  the  computation  of  x\  and 
x'2  is  straightforward.  The  measurement  rate  can  be  computed  by  considering  both  parts  of  the 
measurement  matrices. 

Recovery  of  signal  difference:  The  submatrix  4?/)  is  used  to  recover  the  signal  difference. 
By  subtracting  the  product  of  <b//  with  the  signals  x\  and  X2,  we  have 


$dx  1  -  ®dX2  =  $d{x  1  -  x2). 


In  the  original  representation  we  have  x\  —  x2  =  z\  —  Z2  with  sparsity  rate  2 S'/.  But  z\{n)  —  Z2{n) 
is  nonzero  only  if  z\  (n)  is  nonzero  or  Z2  (n)  is  nonzero.  Therefore,  the  sparsity  rate  of  x\  —  X2  is 
equal  to  the  sum  of  the  individual  sparsities  reduced  by  the  sparsity  rate  of  the  overlap,  and  so  we 
have  S(Xi  —  X2)  =  2 S/  —  (S'/)2.  Therefore,  any  measurement  rate  greater  than  c'(2S/  —  (S/)2)  for 
each  <!>/}  permits  recovery  of  the  length  N  signal  x\  —  x2.  (As  always,  the  probability  of  correct 
recovery  approaches  one  as  N  increases.) 

Recovery  of  average:  Once  x\  —  x2  has  been  recovered,  we  have 

1,  ,11  1,  , 

X!  ~  ~(X\  ~  X2)  =  ~ X 1  +  -X2  =  X2  +  -(X!  -  X2). 

At  this  stage,  we  know  x\  —  x2,  &dx  1,  <&dx 2,  <&a,\X\,  and  &a,2X2-  We  have 


$DX  1  -  ~^d{x\  -  x2)  =  <h/5 

^A,lXl  ~  7^A,l(aff  -  X2)  =  $,4,1 

$A,2X2  +  \®A, 2{xi  -  X2)  =  §A,2 


QX1  +  h2) 


where  <b/}(xi  —  X2),  <hyi,i(xi  —  X2),  and  $21,2  (xi  —  X2)  are  easily  computable  because  (xi  —  X2) 
has  been  recovered.  The  signal  \x\  +  ^X2  is  of  length  N ;  its  sparsity  rate  is  equal  to  the  sum  of 
the  individual  sparsities  Sc  +  2 Sj  reduced  by  the  sparsity  rate  of  the  overlaps,  and  so  we  have 
^(^Xi  +  ^X2)  =  Sc  +  2 Sj  —  2ScSj  —  (S'/)2  +  Sc(S/)2.  Therefore,  any  measurement  rate  greater 
than  d{Sc  +  2 S/  —  2 ScS/  —  (S/)2  +  Sc(S/)2)  aggregated  over  the  matrices  4?/),  and  <&a,2 

enables  recovery  of  -7X1  +  -7X2. 

Computation  of  measurement  rate:  By  considering  the  requirements  on  <3?//,  the  individual 
measurement  rates  R\  and  R2  must  satisfy  (13a).  Combining  the  measurement  rates  required  for 
Tai  and  <&a,2,  the  sum  measurement  rate  satisfies  (13b).  We  complete  the  proof  by  noting  that 
c' (■)  is  continuous  and  that  lims^o  c;(S)  =  0.  Thus,  as  S/  goes  to  zero,  the  limit  of  the  sum 
measurement  rate  is  c\S).  □ 


H  Proof  of  Theorem  7 

We  assume  that  T  is  an  orthonormal  matrix.  Like  itself,  the  matrix  also  has  i.i.d.  AA(0, 1) 
entries,  since  T  is  orthonormal.  For  convenience,  we  assume  T  =  In-  The  results  presented  can  be 
easily  extended  to  a  more  general  orthonormal  matrix  T  by  replacing  <3>j  with  d>y  T. 

Assume  without  loss  of  generality  that  Cl  =  {1,2,...  ,K}  for  convenience  of  notation.  Thus, 
the  correct  estimates  are  n  <  K,  and  the  incorrect  estimates  are  n  >  K  +  1.  Now  consider  the 
statistic  £ n  in  (14).  This  is  the  sample  mean  of  J  i.i.d.  variables.  The  variables  (yj:dj,n)2  are  i.i.d. 
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since  each  y3  =  <f>  j Xj ,  and  <!>.,•  and  Xj  are  i.i.d.  Furthermore,  these  variables  have  a  finite  variance.8 
Therefore,  we  invoke  the  Law  of  Large  Numbers  (LLN)  to  argue  that  £n,  which  is  a  sample  mean 
of  (yj,(j)jjn)2,  converges  to  E[(yj,  4>j,n}2]  as  J  grows  large.  We  now  compute  E[(yj,  (t>j.n)2]  under  two 
cases.  In  the  first  case,  we  consider  n  >  K  +  1  (we  call  this  the  “bad  statistics  case”),  and  in  the 
second  case,  we  consider  n  <  K  (“good  statistics  case”). 

Bad  statistics:  Consider  one  of  the  bad  statistics  by  choosing  n  =  K  +  1  without  loss  of 
generality.  We  have 

■  I< 

E[{yj,4>j,K+l)2}  =  E  ^2  Xj (n) {4>j,ni  </>j,K+l) 

_n=  1 

■  K 

=  E  ^2  Xj(n)2 4>jtK+\)2 
_n=  1 

’  K  K 

+  E  E  E  Xj  (£)xj  (n)  <l>j,K+l}{<f>j,n,  4>j,K+ 1) 

n= 1 1=1 
I< 

=  Y^E  [xj(n)2]  E  [(<t>j,n,  4>j,K+ 1)2] 

n= 1 

K  I< 

+  E  E  E\Xj(£)]E[xj(fl)\E  lt)j,K+ 1)]  j 

n=  1 1=1  ,£^n 

since  the  terms  are  independent.  We  also  have  E[xj(n)\  =  E[xj(£)\  =  0,  and  so 

I< 

E[(yj,<t>j,K+ 1)2]  =  Y1E  [XJ'(n)2]  E  [{<t>3,ni4>j,K+l)2] 

n=  1 
K 

=  [(^J>>^,A-+1)2]  •  (25) 

n=l 


To  compute  E  ^E+i)2]  ,  let  4>jtn  be  the  column  vector  [ai,  a2,  •  •  • ,  cim]T ,  where  each  element 

in  the  vector  is  i.i.d.  A/"(0, 1).  Likewise,  let  4>j,K+i  be  the  column  vector  [6i,  62,  •  •  ■ ,  &m]T  where  the 
elements  are  i.i.d.  7V"(0, 1).  We  have 

(<f>j,n,  <l>j,K+ 1)2  =  (dibi  +  02&2  +  •  •  •  +  OM^m)2 

M  M—l  M 

—  ^  '  ^m  ^rn  T  2  ^  ^  '  OjmClrbmbr . 

m=  1  m=l  r=m+l 


8In  [61],  we  evaluate  the  variance  of  (yj,  (j)j,n)2  as 


Var  [{yjAi.n)2] 


M<ta(MMK  +  6  A2  +  28 M2  +  92 M  +  48 A'  +  90  +  2M 3  +  2MI<2  +  AM2K),  n  €  fi 
2MKu4  (MK  +  3A  +  3M  +  6) ,  n  $  fi. 


For  finite  M,  K  and  <r,  the  above  variance  is  finite. 
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Taking  the  expected  value,  we  have 


E  {$j,m  4*j,K+ 1)  —  E 


'  M  "I  I"  M—l  M 

Y  +  2E  Y  Y  dm^r^m  b r 

m=  1  J  [_m=l  r=m-\- 1 

M  M—l  M 

=  ^  ^  E  +  2  ^  ^  ^  ^  E  [amar6m6r] 

ra=l  ra=l  r=<?+l 

M  M—l  M 

=  YeK]EK]+  2Y  Y  E  [am]  E  [Or]  E  [bn]  E[br] 

m—  1  m= 1  r=m+l 

(since  the  random  variables  are  independent) 
M 

=  ^  (1)  +  o  (since  E  [a^]  =  E  [b^]  =  1  and  E  [am]  =  E  [bm\  =  0) 

m=  1 

=  M, 

and  thus 

E  [{(t>j,n,  (f>j,K+\)2]  =  M.  (26) 

Combining  this  result  with  (25),  we  find  that 

K 

E[{yj,<l>j,K+ 1)2]  =  Ya2M  =  MK°2- 

n=  1 

Thus  we  have  computed  E[(yj ,  c^a'+i)2]  and  can  conclude  that  as  J  grows  large,  the  statistic  £a'+i 
converges  to 

E[(yj,cl)j:K+1)2]  =  MK(j2.  (27) 

Good  statistics:  Consider  one  of  the  good  statistics,  and  without  loss  of  generality  choose 
n  =  1.  Then,  we  have 


Ei(yjA. 


'id/ 


=  E 


=  E 


K 


^(i)ll*,.ll2  +  £  xj  (n)  ($3  ,n>  fij,  l) 


Hj,i\ 


n= 2 


+  E 


K 


^  ]  xj(n)  {4* j,ni 


in=2 


K 


(all  other  cross  terms  have  zero  expectation) 


E  [xj(l)2]  E  [||</)j,i||4]  +YE  [a;i(n)2]  E  (t>j,  l)2]  (by  independence) 

n= 2 
K 

g2E  [11^, ill4]  +  Y  ^E  [<&,»> <M2]  •  (28) 


n=2 


Extending  the  result  from  (26),  we  can  show  that  E(<j)j}n,  4>j,i)2  =  M.  Using  this  result  in  (28), 


K 


E[{yjAj,  i)2]  =  +  Ya2M- 


(29) 


n= 2 
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To  evaluate  E  [||</>j,i  || 4] ,  let  (f>3:\  be  the  column  vector  [ci,C2,. 
vector  are  random  W(0, 1).  Define  the  random  variable  Z  = 


j  cm]T ,  where  the  elements  of  the 


Tlf  =  Em=lcm-  Note  that  (*) 


E 

E 


Pj.il 

^j.i 


=  E  [Z2]  and  (**)  Z  is  chi-squared  distributed  with  M  degrees  of  freedom.  Thus, 
=  E  [Z2]  =  M(M  +  2).  Using  this  result  in  (29),  we  have 

Ei(yj,<t>j,i)2]  =  cr2M(M  +  2)  +  (K-l)a2M 
=  M(M  +  K  +  l)cr2. 


We  have  computed  the  variance  of  (y3 ,  (f)]%  \ )  and  can  conclude  that  as  J  grows  large,  the  statistic 
converges  to 

EKVj,#],  i)2]  =  (M  +  K  +  1)M<t2.  (30) 

Conclusion:  From  (27)  and  (30)  we  conclude  that 


lirn  =  E[(yj,(j)j:n)2} 

J — XX) 


(M  +  K  +  l)Ma2,  neD 
KMa2,  n  ^  fi. 


For  any  M  >  1,  these  values  are  distinct  —  their  ratio  is  M+K+l  _  Therefore,  as  J  increases  we  can 
distinguish  between  the  two  expected  values  of  £n  with  overwhelming  probability.  □ 


I  Proof  of  Theorem  8 


Our  proof  has  two  parts.  First  we  argue  that  limj_>00  =  zc-  Then  we  show  that  this  implies 
vanishing  probability  of  error  in  recovering  each  innovation  Zj. 

Part  1:  We  can  write  our  estimate  as 


^  1  7.T  1  \  'iT  1  \  1 

zc  =  -<$>  y  =  E  yj  =  jZ^ 


Mi 


J 


3= 1 


J  ^  Mjo2  ^*jXj  J  S  Mjc)  S  ( 

j= 1  J  3  j=l  J  3  m=  1 


' 

J,™' 


T//,E  x- 


where  (frfm  denotes  the  m-th  row  of  d>j,  that  is,  the  m-th  measurement  vector  for  node  j.  Since 
the  elements  of  each  <J>j  are  Gaussians  with  variance  a2,  the  product  (< has  the  property 


It  follows  that 


and,  similarly,  that 


=  vpN- 


E[(4>f,m)T  <tf,mxj\  =  °jE[Xj}  =  °jE[zC  +  Zj]  =  a]  ZC 


E 


Mj 

\  ^ 


=  ZC- 


Thus,  zc  is  a  sample  mean  of  J  independent  random  variables  with  mean  zc-  From  the  law  of 
large  numbers,  we  conclude  that  limj^oo  zq  =  zc- 

Part  2:  Consider  recovery  of  the  innovation  Zj  from  the  adjusted  measurement  vector  y3  =  y3  — 
Qj^c-  As  a  recovery  scheme,  we  consider  a  combinatorial  search  over  all  E'-sparse  index  sets  drawn 
from  {1,2, . . .  ,  N}.  For  each  such  index  set  ff,  we  compute  the  distance  from  y  to  the  column 
span  of  denoted  by  d(y,  colspan)^-^/)),  where  is  the  matrix  obtained  by  sampling  the 

columns  from  &j.  (This  distance  can  be  measured  using  the  pseudoinverse  of  <F/,n'-) 

For  the  correct  index  set  12,  we  know  that  d(y~j,  colspan^^n))  — >  0  as  J  — >  oo.  For  any  other 
index  set  fT,  we  know  from  the  proof  of  Theorem  1  that  d(yj,  colspan  (<!>,,■  q/))  >  0.  Let 
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C  :=  min  d(^-,colspan($iin/)). 

With  probability  one,  (  >  0.  Thus  for  sufficiently  large  J,  we  will  have  d(yj,  colspan(<hjio))  <  (/ 2 , 
and  so  the  correct  index  set  D  can  be  correctly  identified.  Since  limj^oo  zc  =  zc,  the  innovation 
estimates  z)  =  zj  for  each  j  and  for  J  large  enough.  □ 
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